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CLONING OF UDP-N-ACETYLGLUCOSAMINE:a-3-D-MANNOSIDE 
^-1,2-N-ACETYLGLUCOSAMINYLTRANSFERASE I 

• - ' * ' ' > 

BACKGRO UND OF THE INVBMTTnM 

V Field o f the Invention 

The present invention relates to DNA sequences for the 
human and rabbit enzymes which control the conversion of 
high mannose to hybrid and complex N-glycans, UDP-N- 
acetylglucosainine:a-3-D-inannoside j5-l,2-N- 

acetylglucosaminyltransf erase I (GnT I), plasmids containing 
such DNA sequences, transformed cells containing such 
plasmids, and a method for converting high mannose 
glycoproteins to branched N-glycan glycoproteins. 

Discussion of the Background 

The biosynthesis of highly branched N- and 0-glycans is 
important to many biological phenomena (Rademacher et al 
(1988) Ann. R^v, Bj.oghem. , vol. 57, 785-838). For example, 
baby hamster kidney cells transformed either by polyoma 
virus or by Rous sarcoma virus show a two-fold increase in 
one of the N-acetylglucosaminyltransf erases (GlcNAc- 
transferase V) involved in the synthesis of highly brariched 
complex N-giycans (Pierce et al (1986) J. Biol, chein. , vol. 
261, 10772-10777; Yamashita et al (1985) J. Biol, nh^m. 
vol. 260, 3963-3969). All N-glycans share the common core 
structure Manal-6 (Manal-3)Man)3l-4GlcNAc^l-4GlcNAc/3-Asn. 
complex N-glycans have "antennae" or branches attached to 
this core. The antennae are initiated by the action of at 
least five Golgi-localized membrane-bound 
GlcNAc-transf erases designated GnT I, ii, iv, v and VI 
(Schachter et al (1989) Methods Enzvmol . . vol. 179, 351-396) 
and may be further elongated by the addition of D-galactose, 
L-fucose and sialic acid residues. Complex N-glycans may be 
"bisected" by a GlcNAc residue attached in ^51-4 linkage to 
the ^-linked Man of the core due to the action of 
GlcNAc-transferase III (GnT III) . . 

The conversion of high-mannose to complex and hybrid N- 
glycans is controlled by UDP-GlcNAc:a-3-D-mannoside /3-1,2-N- 
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acetylglucosairiinyltransf erase I (GnT I, EC 2.4.1.101), which 
catalyzes the reaction: 

UDP-GlcNAc + (Manal-6[Manal-3]Manal-6) (Manal-3) Mani81-4R 
(Manal-6[Manal-3]Manal-6) (GlcNAc/31-2Manal-3)Man)31-4R + UDP, 

where R is GlcNAc^l-4 (+/"Fucal~6) GlcNAc-Asn-X, and Asn-X may 
be an Asn residue which is part of the amino acid sec[uence 
of a protein. 

The enzyme is specific for the Manal-3Man/?l-4GlcNAc-arm 
of the core. The presence of a )32 -linked GlcNAc residue at 
the non-reducing terminus of this arm is essential for 
subsequent action of several enzymes in the processing 
pathway (Schachter et al (1983) Can, J. Biochem. Cell Biol. , 
vol. 61, 1049-1066; Schachter et al (1985) 
"Glycosyltransferases involved in the biosynthesis of 
protein-bound oligosaccharides of the 
asparagine-N-acetyl-D-glucosamine and 

serine (threonine) -N-acetyl-D-galactosamine types", in: A.N. 
Martonosi, ed. The Enzymes of Biological Membranes , New 
York, N.y., Plenum Press, 227-277; Schachter, (1986) 
Biochem. Cell Biol. > vol. 64, 163-181; Schachter (1988) 
Biochemie. . vol. 70(11), 1701-1702), i.e., GnT II, III and 
IV require the prior action of GnT I, and GnT V and VI 
require the prior action of GnT II. GnT I has been reported 
in hen oviduct, Chinese hamster ovary cells, baby hamster 
kidney cells, bovine colostrxim, pig trachea and mammalian 
liver (Schachter et al (1983) Can. J. Biochem. Cell Biol. > 
vol. 61, 1049-1066; Schachter et al (1985) 
"Glycosyltransferases involved in the biosynthesis of 
protein-bound oligosaccharides of the 

asparagine-N-acetyl-D-glucosamine and serine (threonine) -N- 
acetyl-D-galactosamine types", in: A.N. Martonosi, ed. The 
Enzvmes of Biological Membranes . New York, N.Y., Plenum 
Press, 227-277; Schachter et al (1980) "Mammalian 
glycosyltransferases: their role in the synthesis and 
function of complex carbohydrates and glycolipids" , in: 
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Lennarz W.J., ed. Biochemi stry of Glvcoprntelns and 
proteoglycans . New York, N.Y., Plenum Press, 85-160; 
Brockhausen et al (1988) Biochem. Cell Biol . . vol. 66, 1134- 
1151). The enzyme has been partially purified from bovine, 
colostrum (Harpaz et al (1980) J. Biol, chf^m. , vol. 2 55, 
4885-4893) and from pig liver and trachea (Oppenheimer et al 
(1981) J . Biol . Chftin . , vol. 256, 11477-11482), and to 
homogeneity from rabbit liver (Oppenheimer et al (1981) 
J. Biol. Chem, , vol. 256, 799-804; Nishikawa et al (1988) 
J . Biol . Chem . . vol. 263, 8270-8281). 

Recently, the cloning of dna encoding proteins and the 
expression of such cloned DNA to produce the proteins has 
become commercially important. For ease of culturing, it is 
preferred that the cloned DNA be expressed in a primitive 
host, such as a bacteria (e.g., E. coli ) . a yeast, or a 
fungus. However, such primitive hosts may hot normally 
possess the enzymes required for the post-translation 
modification of proteins which occurs. in the cells from 
which the DNA originated. Thus, although many primitive 
hosts possess the necessary enzymes to effect the 
post-translation modification of a protein to a high mannose 
derivative, such host do not contain the enzyme required to 
convert the high mannose derivative to a hybrid and branched 
glycan, GnT I. 

As discussed in Bergh et al, "Glycosylation of 
Heterologously Expressed Proteins: Problems and Solutions", 
in Therapeutic Pentide and Proteins r Assess-ina the New 
Technologies, Marshak et al eds. Cold Spring Harbor 
Laboratory, Banbury Report 29, 1988, in prokaryotes, the 
resulting lack of glycosylation may have a variety of 
consequences, such as incorrect polypeptide chain- folding, 
precipitation and aggregation of the protein, proteolytic 
degradation or enhanced immunogenicity . 

Yeast and vertebrate cells use the same GlCjMansGlcNAc, 
lipid-linked precursor for cotranslational glycosylation of 
asparagine residues, both recognize the same Asn-X-ser/TRr 
sequences, and both remove the three glucose residues soon 
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after transfer. Thus, a xnaminalian glycoprotein expressed in 
yeast may contain the same carbohydrate chains as the native 
protein until after it leaves the endoplasmic reticulum. 
After entry into the Golgi, however, the later steps in 
oligosaccharide processing are very different in yeast (see 
Kukuruzinska et al, Ann. Rev. Biochem. , vol. 56, p. 915, 
1987) and vertebrates, (see Hubbard and Ivatt Ann . Rev , 
Biochem. . vol. 50, p. 555, 1981; Kornfeld and Kornfeld 
Ann, Rev. Biochem, , .vol. 54, p. 631, 1985). Processed 
Saccharomvces cerevisiae N-1 inked oligosaccharides contain 
two GlcNAc residues and from 9 to 50 or more mannose 
residues. On the other hand, mammalian oligosaccharides 
never have more than nine mannose residues and most commonly 
contain GlcNAc, galactose, and sialic acid attached to a 
ManjGlcNACa core. 

Thus, heterologous expression in yeast of a mammalian 
glycoprotein intended for therapeutic use can present a 
number of potential glycosylation-related problems. For 
example, carbohydrate chains may be highly antigenic; in 
addition, they are recognized by Man/GlcNAc-specif ic 
receptors on cells of the mammalian reticuloendothelial 
system, resulting in rapid clearance of the glycoprotein 
from the circulation. 

Thus, it is desirable to: (1) provide large amounts of 
GnT I for the further post translational modification of 
recombinantly produced proteins; and (2) provide a means for 
enabling primitive hosts to express GnT I. 

However, as yet there are no methods available for 
obtaining large quantities of GnT I or enabling primitive 
hosts to express GnT I. 

SUMMARY OF THE INVENTION 

Accordingly, it is an object of the present invention 
to provide a method for producing large quantities of . GnT I. 

It is another object to provide a method for converting 
high mannose derivatives to hybrid and complex N-glycans'. 
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It is another object to provide isolated DNA sequences 
which encode GnT I. 

It is another object to provide plasmids which contain 
a DNA sequence which encodes GnT I. 

It is another object to provide microorganisms which 
contain a heterologous sequence of DNA which encodes GnT I. 

These and other objects, which will become apparent 
during the following detailed description, have been 
achieved by the inventors • isolation and cloning of DNA 
sequences encoding rabbit and human GnT I, preparation of 
plasmids containing such DNA sequences and transfection of 
microorganisms, with such plasmids. 

BRIEF DESCRTPTTOM Qp ctb r>P| iWTMr^c 
A more complete appreciation of the invention and many 
of the attendant advantages thereof will be readily obtained 
as the same become better understood by reference to the 
following detailed description when considered in connection 
with the accompanying drawings, wherein: 

Figure 1 illustrates the amino acid sequence data for 
the eight peptides isolated from rabbit liver GnT I and 
nucleotide sequences of. the six synthetic oligonucleotides 
prepared on the basis of the peptide sequences. The single 
letter code is used for amino acid sequence data; upper case 
letters indicate firm assignments and lower case letters 
indicate tentative assignments. The underlined sections of 
the peptide sequences indicate the regions used for the 
design of oligonucleotide probes. Probes 2, 3 and 6 were 
based on peptides 2, 3 and 6, respectively; s indicates 
••sense" and A indicates "antisense" directions; 

Figure 2 illustrates a schematic representation of 
GnT I clones. PGR product, product obtained by PCR 
amplification of rabbit liver cDNA; rc 1600, 1.6 kb GnT I 
CDNA clone; rc2500, 3.0 ]cb GnT I cDNA clone. The shaded 
boxes represent the coding region. During subcloning, the 
3.0 kb CDNA was reduced to 2. 5 kb by a 0.5 kb deletion aC 
the 5 '-end; 
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Figure 3 illustrates the results of an agarose gel 
electrophoresis (1% agarose) of the products of the 
polymerase chain reaction (PGR) using rabbit liver cDNA as 
template and the following combinations of oligonucleotides 
as primersr 2S-3A; 2S-6A; 3S-2A; 3S-6A; 6S-2A7 6S-3A 

(Figure 1) - Conditions of PGR are given in the Methods 
section. The gel was stained with ethidium bromide 

(0.5 /xg/ml) . Primer-dependent products were obtained with 
combinations 2S-6A (0.50 kb) and 3S-6A (0.45 kb) . The arrow 
designates the 0.5 kb DNA marker; the remaining standards 
are at 1.0 kb, 1-6 kb, 2.0 kb and at 1.0 Jcb intervals 
thereafter; 

Figure 4 illustrates the nucleotide sequence (lower 
case) of the 2 . 5 kb GnT I cDN'A clone. The amino acid 
sequence in the coding region is shown in upper case 
letters. The positions of the eight peptide sequences 
obtained from proteolytic digests of GnT I (Figure 1) are 
underlined with a single solid line; the regions of these 
peptide sequences used for oligonucleotide probe synthesis 
(Figure 1) are additionally underlined with a discontinuous 
line. The putative transmembrane segment (bases 62-13 6) is 
underlined with a double line. The consensus 
polyadenylation signal AATAAA at position 2435 is 
underlined. Only the nucleotide sequence is numbered; 

Figure 5 illustrates an autbradiogram of an SDS- 
polyacrylamide gel electrophoresis experiment showing in 
vitro transcription and translation of the rabbit cDNA. 
mRNA was generated from the 2.5 kb GnT I cDNA and was used 
as the template for in vitro translation using rabbit 
reticulocyte lysate and L-[^^S] -methionine (see Methods for 
details). Lane C, no plasmid in the incubation; lane 12^ 
pGEM-7z containing the 2.5 kb GnT I cDNA with an insert 
between bases 56 and 57 which interrupts the reading frame; 
lane 16, pGEM-72 containing the 2.5 Icb GnT I cDNA 
(pGEM-7z-rcgntl) ; _ 

Figure 6 illustrates the nucleotide sequence for hxaman 
genomic DNA encoding for GnT I; 
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Figure 7 illustrates the aroino acid sequence for human 
GnT I; and 

Figure 8 illustrates both the nucleotide sequence for 
human genomic DNA encoding for GnT I and the amino acid. . 
sequence of human GnT I. 

• DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Thus, one aspect of the present invention relates to 

isolated DNA sequences which encode rabbit GnT I. 

Specifically, such DNA sequences encode a protein having the 

sequence (starting from the N-terminal) of formula I shown 

below: 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 
LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 
ARC PRO VAL PRO SER ARG LEU PRO SER ASP ASN ALA LEU ASP ASP 
ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 
ALA GLU VAL GLU LEU GLU ARG GLN ARG GLY LEU LEU GLN GLN ILE 
ARG GLU HIS HIS ALA LEU TRP SER GLN ARG TRP LYS VAL PRO THR 
ALA ALA PRO PRO ALA GLN PRO HIS VAL PRO VAL THR PRO PRO PRO 
ALA VAL ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL 
ARG ARG CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU 
LEU PHE PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR 
ALA GLN VAL ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG 
GLN PRO ASP LEU SER ASN ILE ALA VAL GLN PRO ASP HIS ARG LYS 
PHE GLN GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU 
GLY GLN ILE PHE HIS ASN PHE ASN TYR PRO ALA ALA VAL VAL VAL 
GLU ASP ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE GLN 
ALA THR TYR PRO liEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL 
SER ALA TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP SER SER 
LYS PRO GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY 
TRP LEU LEU. LEU ALA GLU LEU . TRP ALA GLU LEU GLU PRO LYS TRP 
PRO LYS ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG 
LYS GLY ARG ALA CYS VAL ARG PRO GLU ILE SER ARG THR MET THR 
PHE GLY ARC LYS GLY VAL SER HIS GLY GLN PHE PHE ASP GLN HIS 
LEU LYS PHE ILE LYS LEU, ASN GLN GLN PHE VAL PRO PHE THR GLN 
LEU . ASP LEU SER TYR LEU GLN GLN GLU ALA TYR ASP ARG ASP-PHE 
LEU ALA ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL 
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ARG THR ASN ASP ARG LYS GUI LEU GLY GLU VAL ARG VAL GLN TYR 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG- VAL HIS LEU ALA PRO 
PRO GLN THR TRP ASP GLY TYR ASP PRO SER TRP THR 

In another aspect, the present invention relates to DNA 
sequences which encode human GnT 1. Such DNA secpiences 
encode a protein having the sequence (starting from the 
N-tenninus) of formula II shown below: 

1: MET LED LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 
16: LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 
31: ARG PRO ALA PRO GLY ARG PRO PRO SER VAL SER ALA LEU ASP GLY 
46: ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 
61: ALA GLU VAL GLU LEU GLU ARG ARG ARG GLY LEU LEU GLN GLN ILE 
76: GLY ASP ALA LEU SER SER GLN ARG GLY ARG VAL PRO THR ALA ALA 
91: PRO PRO ALA GLN PRO ARG VAL PRO VAL THR PRO ALA PRO ALA VAL 
106: ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL ARG ARG 
121: CYS LEU ASP LYS LEU. LEU HIS TYR ARG PRO SER ALA GLU LEU PHE 
136: PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR ALA GLN 
151: ALA ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG GLN PRO 
166: ASP LEU SER SER ILE ALA VAL PRO PRO ASP HIS ARG LYS PHE GLN 
181: GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU GLY GLN 
196: VAL PHE ARG GLN PHE ARG PHE PRO ALA ALA VAL VAL VAL GLU ASP 
211: ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE ARG ALA THR 
226: TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL SER ALA 
241: TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP ALA SER ARG PRO 
256: GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY TRP LEU 
271: LEU LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP PRO LYS 
286: ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG GLN GLY 
301: ARG ALA CYS ILE ARG PRO GLU ILE SER ARG THR MET THR PHE GLY 
316: ARG LYS GLY VAL THR HIS GLY GLN PHE PHE ASP GLN HIS LEU LYS 
331: PHE ILE LYS LEU ASN GLN GLN PHE VAL HIS PHE THR GLN LEU ASP 
346: LEU SER TYR LEU GLN ARG GLU ALA TYR ASP ARG ASP PHE LEU ALA 
361: ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL ARG THR ' 
376: ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR THR GLY 
391: ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL MET ASP 
406 : ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY ILE VAL 
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421: THR PHE GLN PHE ARG GLY ARG ARG VAL HIS LED ALA PRO PRO PRO 
436: THR TRP GLU GLY TYR ASP FRO SER TRP ASN 

Exemplary of the DNA sequences encoding rabbit GnT I is 
the sequence (starting from the 5 '-terminus) of formula III, 
shown below: 



atg 


ctg 


aag 


aag 


cag 


tct 


get 


ggg 


ctt 


gtg 


ctg 


tgg 


ggt 


get 


ate 


etc 


ttt 


gtg 


gcc 


tgg 


aat 


gcc 


ctg 


ctg 


etc 


etc 


ttc 


ttc 


tgg 


aca 


cgt 


cca 


gtg 


cct 


age 


agg 


ctg 


ccg 


tea 


gac 


aat 


get 
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gat 


gat 


gac 


cct 


gcc 


age 


etc 


ace 


cgt 


gag 


gtg 


ate 


cgc 


tta 


get 


cag 


gat 


gcc 


gag 


gta 


gag 


ttg 


gaa 


cgt 


cag 


egg 


gga 


ctg 


ttg 


cag 


cag 


att 


agg 


gag 


cac 


cat 


get 


ctt 


tgg 


age 


cag 


egg 


tgg 


aag 


gtg 


cct 


act 


gca 


gcc 


cct 


cct 


get 


cag 


ccg 


cat 


gtg 


cct 


gtg 


acc 


cca 


ccg 


cca 


get 


gtg 


ate 


ccc 


ate 


ctg 


gta 


att 


gcc 


tgt 


gac 


cgc 


age 


acc 


gtc 


cgc 
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tgt 


ttg 
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eta 


ctg 


cat 


tat 


egg 


cct 
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get 


gag 


ctg 


ttc 


ccc 


ate 


att 


gtc 


age 


cag 


gac 


tgt 


ggg 


cat 
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gag 


aca 


gcc 


cag 


gtc 


att 


get 


tec 


tat 


ggc 


age 


gca 


gtc 


aca 


cac 


ate 


egg 


caa 


cct 


gac 


ctg 


age 


aac 


att 


get 


gtg 
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ccc 


gac 


cac 


cgc 


aag 


ttc 


cag 


ggc 


tac 


tac 


aag 


ate 


gca 


egg 


cat 


tac 


cgc 


tgg 


gca 


ttg 


ggc 


caa 


ate 


ttc 


cac 


aat 


ttc 


aac 


tac 


cca 
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gtg 


gtg 


gtg 


gag 


gat 


gat 


etc 


gag 


gtg 


gca 


cca 


gac 


ttc 


ttt 


gag 


tac 


ttc 


cag 


gcc 


act 


tac 


cca 


ctg 


ttg 


aaa 


gca 


gac 


ccc 


tec 


etc 


tgg 


tgt 


gtg 


tct 


gcc 


tgg 


aat 


gac 


aat 


ggc 


aaa 


gaa 


cag 


atg 


gta 


gac 


teg 


agt 


aag 


cca 


gag 


tta 


etc 


tac 


cgc 


aca 


gat 


ttc 


ttt 


cct 


ggc 


tta 


ggc 


tgg 


tta 


ctg 


ttg 


get 


gaa 


etc 


tgg 


get 


gaa 


ctg 


gag 


ccc 


aag 


tgg 


ccc 


aaa 


gcc 


ttc 


tgg 


gat 


gac 


tgg 


atg 


cgc 


egg 


cct 


gag 


cag 


ega 


aag 


ggg 


agg 


gcc 


tgt 


gtg 


cgt 


cca 


gaa 


ate 


tea 


aga 


aca 


atg 


aca 


ttt 


ggc 


egg 


aag 


ggt 


gtg 


age 


cat 


ggg 


cag 


ttc 


ttt 


gac 


cag 


eat 


etc 


aag 


ttc 


ate 


aag 


ctg 


aac 


cag 


cag 


ttt 


gta 


ccc 


ttc 


acc 


cag 


ctg 


gac 


ctg 


teg 


tac 


ctt 


cag 


cag 


gag 


gcc 


tat 


gac 


egg 


gat 


ttc 


ctt 


get 


cgt 


gtt 


tat 


ggt 


get 


ccc 


cag 


tta 


cag 


gtg 


gag 


aaa 


gtg 


agg 


acc 


. aat 


gac 


egg 


aag 


gag 


eta 


gga 


gag 


gtg 


cgc 


gta 


cag 


tac 


aca 


ggc 


agg 


gac 


age 


ttc 


aag 


get 


ttc 


gcc 


aag 


gcc 


ctg 


ggt 


gtc 


atg 


gat 


gac 


etc 


aaa 


tea 


ggt 


gta 


ccc 


agg 


get 


gga 


tac 


egg 


ggc 


att 


gtc 


acc 


ttc 


tta 


ttc 


egg 


ggc 


cgc 


cgt 


gtc 


cac 


ctg 


gcg 


ccc 


cct, cag 


act 


tgg 


gat 


ggc 


tat 


gat 


cct 


agt 


tgg 


act 









The DNA sequence of formula III corresponds to the- 
, coding region of rabbit cDNA encoding GnT I. Another 
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example of a DNA sequence encoding rabbit GnT I is a larger 
section of cDNA encoding rabbit GnT I, which has the formula 
IV as shown below: 

1 gaattccggc aagtcatacc tttgcctgcc ctcccctgtg ggggccagg 

atg ctg aag aag cag tct get ggg ctt gtg ctg tgg ggt get ate 

etc ttt gtg gee tgg aat gee ctg ctg etc etc tte ttc tgg aea 

cgt cca gtg cet age agg ctg ccg tea gac aat get etc gat gat 

gae cet gee age etc ace cgt gag gtg ate egc tta get cag gat 

gee gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag att 

agg gag cac cat get ctt tgg age cag egg tgg aag gtg cet act 

gca gee cet cet get cag ccg eat gtg cet gtg acc cca ccg cca 

get gtg ate cec ate ctg gta att gee tgt gac egc age ace gte 

cgc egc tgt ttg gac aag eta ctg cat tat egg cet tea get gag 

ctg tte cec ate att gtc age cag gac tgt ggg cat gag gag aea 

gee cag gte att get tec tat ggc age gca gtc aca cac ate egg 

caa cet gac ctg age aac att get gtg cag cec gac cac cgc aag 

ttc cag ggc tae tac aag ate gca egg cat tac cgc tgg gca ttg 

ggc caa ate ttc cac aat ttc aac tac cca gca get gtg gtg gtg 

gaa gat gat etc gag gtg gca cca gac ttc ttt gag tac ttc cag 

gee act tae cca ctg ttg aaa gca gac cec tec etc tgg tgt gtg 

tct gee tgg aat gac aat ggc aaa gaa cag atg gta gac teg agt 

aag cca gag tta etc tac cgc aca gat ttc ttt cot ggc tta ggc 

tgg tta ctg ttg get gaa etc tgg get gaa ctg gag cec aag tgg 

cec aaa gee ttc tgg gat gac tgg atg cgc egg ect gag cag ega 

aag ggg agg gee tgt gtg cgt cca gaa ate tea aga aca atg aea 

ttt ggc egg aag ggt gtg age cat ggg cag ttc ttt gac cag cat 

etc aag ttc ate aag ctg aac cag cag ttt gta cec ttc acc cag 

ctg gac ctg teg tae ctt cag cag gag gee tat gac egg gat tte 

ctt get cgt gtt tat ggt get cec cag tta cag gtg gag aaa gtg 

agg ace aat gac egg aag gag eta gga gag gtg cgc gta cag tae 

aca ggc agg gac age ttc aag get ttc gee aag gee ctg ggt gte 

atg gat gac etc aaa tea gcft gta cec agg get gga tac egg ggc 

att gtc acc ttc tta ttc egg ggc cgc cgt gtc cac ctg gcg cec 

cet cag act tgg gat ggc tat gat cet agt tgg act 

caacagctcc tgcctgtccc ttctgggctc cttccttgca atttcatgat ctaagatggg 
accgtagtcc ctgggctgca ttgtcttttc tgtctttccc tcttgggtcc attttttttt ^ 
tttccttttt tzgagtggcat tt:gaat:acac agatgacaag gtgagggttc ttittgttaaa 
ggagttagat: cagggaaagc attctgctgt ctgttgggta tcaagcagca aaccactgtg 
trgatagggga agaatgggcc ttttggggcc agaaatatcc atgttctgag tttttctctt 
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aggtcatctg cagaggagtt ggcaacttta gctttcttaa ccaggccttt tctttctgac 
ctgagagcca gggcatgaga cttcttgttc atgctccttt ttaccttccc ctaataaggg 
tctgggctac aggagaagtg aacatattgt ggccagaata atactaacca gaggggcctc 
attgtcagag tctaggtgca gttattgggt tgtcagagtt aatgccttct gttcttcttt 
ccttattcct gacttctgtc agctcttctt tctttgcagc ctagcaattt ttggttctaa 
gatgaaaaat gaagaggaaa agaaatattc gcacccagct attgggagaa aggtagtggg 
aaaaaaactt cattgtacca cttcaaagag acactcttga cctcttcctt tctaaaaatt 
agtcccctcc ctgttgcttc aggagaatgc tgtgctggtc agttctgtgt gatccttctt 
ccctgagttt tatacacagg ctcctcccta aggctgtggc ttctggtggc cctcctgaca 
taagttacag tggccaagac caggacaact ccggccatga gctaagtcct gcctaccttc 
tccaaaacat tcccatgtcc tcacaggcta ggatgcagat gttggttgga gaggaatttg 
t^gtgtgtgtg tgtgtgtgtg tgtgttttct tgcctgacct cagtttcatg gatgaaaagt 
ggaagctaca gaattatttt caaa aataaa ggctgaattg tctgaaaaaa aaaaaaaaaa 
aaaaaaccgg aattc 



The DNA sequence of formulae III and iv have been 
obtained by cloning the rabbit cDNA encoding GnT X, by the 
procedure which is described in detail in the Examples 
section. 

Exemplary of the DNA sequences encoding human GnT I is 
the sequence (starting at the 5 ' -terminus) of formula V, 
shown below: 

atgctgaa gaagcagtct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
961 ggaatgccct gctgctcctc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
1021 tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
1081 acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt 
1141 cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
1201 cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 
1261 gctgcctgga caagctgctg cattatcggc cct:cggctga gctcttcccc atcatcgtta 
1321 gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
1381 cgcacatccg gcagcccgac ctgagcagca ttgcggtgcc gccggaccac cgcaagttcc 
1441 agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
1501 ttcgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
1561 agtacttrtcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg 
1621 cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
1681 gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
1741 agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
1801 ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
1861 cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtgc 
1921 acttcaccca gctggacctg tcttacctgc agcgggaggc ct:atgaccga gatttcctcg 
1981 cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
2041 agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
2101 ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
2161 tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggct 
2221 atgatcctag ctggaat 



The, DNA sequence of formula V corresponds to the coding 
region of human genomic DNA encoding GnT I. Another example 
of a DNA sequence encoding human GnT I is a larger section 
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of human genomic DNA encoding GnT I, which has the formula 

VI, shown below: 

1 aagttittgaa tgtttaagtt tatttaagtt tratttctaaa tattttctrca tttctctggc 
61 ttttgtaagt agggttttct catccatgtt ttcttctcat gagttatttg tggatatgaa 
121 ggctatccat tagtatatgt tgatttttat atracacttrc cttgctcagt t:catt:at:tga 
181 ttctttttga gttttccagg catattctca caagt:aaaga taa^agaaat agtttgcttc 
241 ctttccactt ctgctttgaa tttttttttc ttggttcatt trgcattggct gcttcctcca 
301 gcaaaatgtt aaataaccct ggagatgatg ggcaacttcg ttttgctcct gacattcgtg 
361 gggtgcctct ggtgcttccc tgttggtaag gggttaactg tagccctgag gtgggacatrt 
421 t:ga^tt:eaaa aat:cagt:cat cttggggcgc t:t:aggtt:aga ggaatggt:ag gcagatrgctg 
481 trcactccttg cccctrcccct cctccttccc acctggaggg gaaatgaaat ctgacaggta 
541 gaaagagggg agtrtggggtt ctttttctct ctccctccac cagcatcact ctctgcctct 
601 ccctrcaaaaa tacgttcctrg ggtrcaggata t:at:gt:tgact: ccct:agagag ctrctggagtc 
661 aacctcctgg ccttcctcca ccctcactct tggccttttc ctgcccccat ttcctctacc 
721 tgtggggcat ggagccacga gcctttgtgt gacggtttgc tttctctctc ctgtctttag 
781 gtgcatggct: gcctcctaat cccatagtcc agaggaggca trccctaggac t:gcgggcaag 
841 ggagccgcaa gcccagggca gccttgaacc gtcccctggc ctgccctccg gtgggggcca 
901 ggatgctgaa gaagcagtct gcagggcttg trgctgtgggg cgctatcctc tttgtggcct 
961 ggaatgccct gctgctcctc ttctztctgga cgcgcccagc acctggcagg ccaccctcag 
1021 tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
1081 acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt 
1141 cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
1201 cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 
1261 gctgcctgga caagctgctg cattatcggc cctcggctga gctcttcccc atcatrcgtra 
1321 gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
1381 cgcacatccg gcagcccgac crgagcagca ttgcggtgcc gccggaccac cgcaagttcc 
1441 agggctracta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
1501 ttcgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
1561 agtactttcg ggccacctat: ccgctgctrga aggccgaccc ctccctgtgg tgcgtctcgg 
1621 cctggaatrga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
1681 gcaccgactt tittccctrggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
1741 agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
1801 ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
1861 cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtgc 
1921 acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
1981 cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
2041 agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
2101 ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
2161 tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg trgggagggct 
2221 atgatcctag ctrggaatitag cacctgcctg tccttcctgg gccccttctt gccacatcat: 
2281 gagctgaggt gaccacagrc cccaggctgc atcggcctgc ctgtgtttcc ctrcttaggtg 
2341 catttatctrt: tttgattttt ccgagtggca tttaagtgca caaatgataa caagaggatt 
2401 attrctcccgt tctcaaggga gtcagatcag gggaactatt ctagggtatg ttgcggggta 
2461 ttaagcagga aaacactgtg tggtgggggg cactgggctt gttggggcca caaatgtcca 
2521 cgtcctgagc tttctcctgg agcat:gt:gca gagagtttgg caacgttcgc tctcttgacc 
2581 agaccccttc tccctgactg gctcttccag ccaggcacga gccctccttc tatacctgct 
2641 ccccttccca gtggggactg agttatggga gaaggggaca tatttgtggc caaaatgata 
2701 ctraaccaaag gggcttcctt gtcagggcct ggtggagttg gtgggtcatc ggggctcact 
2761 gcctcctgcc cttctctcct gtctgacccc cacttagccc ttctctcctt gcagcctagc 
2821 agtttatagt tctgagatgg aaagttgaag ggggcaagca agacctctcc tcagcccatg 
2881 cccagctgtc aggagagagg tgcagggagg aaggccttgt gctgggacaa cctctctctt 
2941 gccttacctt cagagaggac tatgccctga cccctccttt ctgaaaatca gtgccctcrrc 
3001 tgttgctcta ggaggct:cct gctggcttgg tagaagacag aattcgatct gcctgtccct 
3061 ttttcccctg gggt:tt:gaca cacaggctcc tctcagcatg aggtggagca gt:gaccaggt 
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3121 ggagcagtga ccaggacgcc tctggcccag tgctgcccag cctccccgcc cgctcccagg 
3181 cgccccacgt cctcacaggc caggacgcca cggcggccgg gagcatgcga 

The DNA sequences of formulae V and VI have been 
obtained by cloning human genomic DNA encoding GnT I, by the 
procedure which is described in detail in the Examples 
section. 

Of cQurse, it is to be understood that the present DNA 
sequences also include those which may not exactly match the 
sequences of formulae III-VI, but rather contain a small 
number of nucleotide substitutions, deletions, and/or 
additions. Further, the present DNA sequences also include 
those which encode for amino acid sequences which may not 
exactly match the sequences of formulae I and II, but rather 
contain a small number of amino acid residue substitutions, 
deletions, and/or additions, provided that the protein 
encoded by the DNA sequence exhibits GnT I activity. 

In another embodiment, the presient invention relates to 
plasmids which contain a DNA sequence encoding rabbit or 
human GnT I. Such plasmids may be prepared by conventional 
tjechniques and include plasmids formed by inserting one of 
the present DNA sequences into any suitable plasmid. 
Specific examples of the present plasmids include 
pGEM~7z-rcgntl, in which a 2.5 kb sequence of rabbit cDNA 
encoding for GnT I (Figure 2) has been inserted into 
pGEM-7z; pGEX-2t-rcgntl, in which a 2 . 5 kb sequence of 
rabbit cDNA encoding GnT I bas been inserted into pGEX-2t; 
and pGEM-5z-hggnti, in which a 4 kb sequence of human 
genomic DNA encoding GnT I has been inserted into pGEM-5z. 
The preparation of the plasmids pGEM-7z~rcgntl , 
pGEX-2t-rcgntl, and pGEM~5z-hggntl is described in detail in 
the Examples section, and all three of these plasmids have 
been deposited under the provisions of the Budapest Treaty 
with the American Type Culture Collection, 12301 Parklawn 
Drive, Rockville, MD 20852, USA on November 30, 1990 
(Accession numbers ' not yet known). 

In another embodiment, the present invention relates to 
transformed microorganisms which contain a heterologous 
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sequence of DNA encoding rabbit or human GnT I. Examples of 
suitable host cells including: bacteria, such as E. coli . 
Brevibacteria ; and Coryneforms; fungus, such as Trichoderma 
reesei , Aspergillus niger, and Aspergillus awamori ; yeast, 
such as Saccharomvces ce revisiae . Candida albicans . Candida 
utj-lis, Candida parapsilosl s , Schizosaccharomyces pombe . 
Bandeiraea si mplicifolia . Kluweromvces lactis , 
Saccharomvces kluweri. Hansenula . Saccharomvcodes and 
Pichia ; and vertebrate cells such as Chinese hamster ovary 
cells and COS cells. The transformed cells may be prepared 
by trans fecting the cells with any of the present plasmids 
by conventional methods. 

Another aspect of the present invention relates to 
methods for the production of GnT I. in a first embodiment, 
the present method comprises cell-free or in vitro 
expression of one of the present DNA sequences to obtain 
GnT I. For example, in vitro transcription and translation 
of one of the present plasmids using a system such as 
described in Methods in Molecular Biolnqy ^ Nucleic Acids . 
Walker, ed. , Humana Press, Clifton, NJ, pp 145-155 (1984) 
yields GnT I. 

In another embodiment, the present method comprises * 
culturing a microorganism which contains a heterologous 
DNA sequence which corresponds to one of the present DNA 
sequences. Although the culturing conditions, such as time, 
medium, temperature, light, and agitation, will depend on 
the identity of the host microorganism and the yield of 
GhT I desired, these conditions are readily determined by 
those skilled in the art. 

In a further aspect, the present invention relates to a 
method for converting a glycoprotein which is in the high 
aannose form to a glycoprotein which is in the form of a 
hybrid or complex N-glycan. In a first embodiment, the 
present method may be carried out by reacting, in vitro , a 
glycoprotein which is in the high mannose form with 
mannosidases followed .by UDP-GlcNAc in the presence of 
GnT I. . 
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In another embodiment, the present method may comprise 
culturing a cell which produces a glycoprotein in high 
mannose form and which also contains a heterologous sequence 
of DNA encoding human or rabbit GnT I. For example, 
transf action of cell, which normally produces a glycoprotein 
in a mannose form, with one of the present plasmids may be 
used to form a cell which produces the protein (produced in 
high mannose form before transf ection) as a hybrid or 
complex N-glycan. Preferably, the glycoprotein, which is 
produced in the high mannose form prior to transf ection with 
the present DNA, is also produced by the host cell as a 
result of transformation • In other words, the DNA encoding 
the glycoprotein is also heterologous with respect to the 
host cell. 

Examples of such glycoproteins are described in Tanner 
®^ ^1/ Biochimica et Bioohvsica Acta ; vol. 906, pp.. 81-99 
(1987); and Kukurazinska et al, Ann, Rev. Biochem. . vol. 56, 
pp 915-944 (1987) and include SUC 2, CSF, c-IgM /x^chain, 
c-IgM chain, c-amylase, c-HBsAg, c-hemagglutinin , c-a^ 
antitrypsin, c-prea^, antitrypsin, c-glycoamylase, c-VSV gp, 
c-sindbis virus El yp, c-sindbis virus E2 gp, 
c-killerprotoxin (type I), c-phascolin a and /3, hepatitis B 
virus surface antigen, interf eron-gamma, tissue plasminogen 
activator, monoclonal anti-bodies, chicken ovalbumin-like 
proteins, interleukin-2 , and proteins from vesicular 
stomatitis, influenza, and Semi iki Forest viruses. 

As noted above, branched glycctns on membrane 
glycoproteins have been implicated in a variety of 
biological phenomena, e.g. tumor progression and metastasis, 
embryogenesis, cell differentiation, cell-cell and 
receptor-ligand interactions, viral and bacterial 
infectivity, fertilization and the control of the immune 
system (Rademacher et al (1988) Ann. Rev. Biochem. . vol. 57, 
785-838; Pierce et al (1986) J. Biol, Chem, , vol. 261,' 
10772-10777; Yamashita et al (1985) J> Biol, Chem, . 
voir 260, 3963-3969; Schachter (1986) Biochem. Cell Biol. . 
vol. 64, 163-181; West (1986) Mol , Cell. Biochem. , vol. 72, 
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3-20; Narasimhan et al (1988) J. Biol. Chem, , vol. 2 63, 
1273-1281; Dennis et al (1987) Science , vol* 236, 582-585). 
GnT I catalyzes an essential first step in the conversion of 



Brockhausen et al (1988) Biochem, Cell Biol. ^ vol. 66, 
1134-1151) . In vitro transcription/translation of the 
2 . 5 kb cDNA reported in this paper results in GnT I activity 
demonstrating the cloning of the gene for the catalytic 
domain of this important control enzyme. 

At least seven glycosyltransf erases involved in the 
synthesis of N- and 0-glycans have been cloned to date, 
i.e., UDP-Gal:GlcNAc-R /31,4-Gal -transferase (Appert et al 
(1986) Biochem . Bioohvs ■ Res . Commons . , vol. 139, 163-168; 
P'Agostaro et al (1989) Eur. J. Biochem. , vol. 183, 211-217; 
Masri et al (1988) Biochem. Biophvs. Res, Commun. . vol. 157, 
657-663; Narimatsu et al (1986) Proc. Nat. Acad. Sci. USA , 
vol. 83, 4720-4724; Shaper et al (1986) Proc. Nat. Acad. 
Sci, USA , vol. 83, 1573-1577; Shaper et al (1988) J. Biol, 
Chem. , vol, 263, 10420-10428; Nakazawa et al (1988) 
J. Biochem. fTo3cyo> , vol. 104, 165-168). UDP-Gal:Gal-R 
al, 3 -Gal-trans f erase (Joziasse et al (1989) J. Biol . Chem. , 
vol, 264, 14290-14297; Larsen et al (1989) Proc , Natl , Acad . 
Sci. USA , vol. 86, 8227-8231; Larsen et al (1990) J. Biol. 
Chem. . vol. 265, 7055-7061; Smith et al (1990) J. Biol. 
Chem. . vol. 265, 6225-6234), CMP-sialic acid:Gal-R 
a2, 6-sialyltransf erase (Weinstein et al (1987) J. Biol. 
Chem. , vol. 262/ 17735-17743) , CMP-sialic acid:Gal-R 
a2 , 3-sialyltransferase (Paulson et al (1990) FASEB J. , vol. 
4, A1862) , GDP-Fuc:Galj3 1,4(3) GlcNAc-R (Fuc to 
GlcNAc) al, 3 (4) -Fuc-transferase (Gersten et al (1990) 
FASEB J, . vol. 4, A1930; Kukowska-Latallo (1990) FASEB J. . 
vol. 4, A1930) , GDP-Fuc:Gal-R al, 2 -Fuc-transferase (Rajan et 
al (1989) J. Biol. Chem. , vol. 264(19), 11158-11167; Ernst 
et al (1989) J. Biol, Chem, . vol. 264(6). 3436-3447) and 
UDP-GalNAc:Fucal,2Gal-R (GalNAc to Gal) 

al,3-GalNAc-transf erase (Yamamoto et al (1990) J. Biol. 



. high mannose to branched hybrid and complex N-glycans 
(Schachter (1986) Biochem. Cell Biol. , vol. 64, 163-181; 
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Chem , , vol. 265, 1146-1151). These transferases all place 
sugars in terminal or subtenninal positions; three of them 
(/?1,4-Gal-, a2 , 6-sialyl-, and al, 3-GalNAc-transf erases) have 
been localized to the trans-Golgi cisternae and trans-Golgi 
network, at least in some tissues (Roth et al (1982) J. Cell 
Biol> , vol. 92, 223-229; Roth (1984) J. Cell Biol. . vol. 98, 
399-406; Roth- (1987) Biochem. Bioohvs. Acta. , vol. 906, 
405-436; Roth et al (1988) Eur. J. Cell Biol. . vol. 46, 
105-112; Duncan et al (1988) J. Cell Biol, , vol. 106, 
617-628; Lee et al (1989) J. Biol. Chem. . vol. 264, 
13848-13855; Tooze et al (1988) J. Cell Biol. . vol. 106, 
1475-1487; Berger et al (1985) Proc. Nat. Acad. Sci. USA , 
vol. 82, 4736-4739; Taatjes et al (1988) J. Biol. Chem. . 
vol. 263, 6302-6309). Human al,3-GalNAc-transferase and a 
human pseudogene showing homology to murine al,3-Gal- 
transf erase share 55% homology (Laresen et al (1990) 
J. Biol . Chem. . vol. 265, 7055-7061). CMP-sialic acid: Gal-R 
a2,6- and a2 , 3-sialyltransf erases exhibit 50% identity and 
8 0% conservation over a 50 amino acid stretch (Paulson et al 
(1990) FASEB J. . vol. 4, A1862) . The remaining transferases 
share no significant sequence similarities but have very 
similar domain structures, i.e., a short amino-terminal 
cytoplasmic tail, a 16-20 amino acid transmembrane segment 
(non-cleavable signal-anchor domain) , a "stem" or "neck" 
region of undetermined length, and a long carboxy terminal 
catalytic domain which is in the Golgi lumen (Paulson et al 
(1989) J. Biol. Chem. . vol. 264, 17615-17618) . 

The -presence of a "neck" region is based on the finding 
that the a2 , 6-sialyl transferase (Weinstein et al (1987) 
J. Biol . Chem. vol. 262, 17735-17743; Lammers et al (1988) 
Biochem. J. , vol. 256, 623-631) and the /3l, 4-Gal-transf erase 
(D'Agostarp et al (1989) Eur, J. Biochem. . vol. 183, 
211-217) can be cut by proteases to release a smaller 
catalytically active protein lacking the trans-membrane 
domain. The exact length of this "neck" region cannot be 
stated with accuracy since it is not known how. much of the 
amino-terminal sequence can be removed without loss of 
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catalytic activity. It has been shown that rabbit liver 
GnT I (Nishikawa et al (1988) J. Biol> Chem. , vol. 2 63, 
8270-8281) and rat liver UDP-GlcNAc:a-6-D-iaannoside jS-l,2-N- 
acetylglucosaminyl transferase II (GnT II) (Bendiak et al 
(1987) J. Biol. Chem. . vol. 262, 5784-5790; Bendiak et al 
(1987) «J> Biol. Chem, . vol. 262, 5775-5783) exist in two 
forms, a large amount of presumably membrane-bound material 
which does not adhere to columns and a small amount of 
material which can be purified. In the case of GnT I, it is 
now clear from the sequence analysis that the 45 kDa form of 
the catalytically active protein previously purified has 
been derived from the membrane-bound precursor by 
proteolytic cleavage at about base position 215 in the 
"neck" region (Figure 4) . The N-terminal blockage of this 
45 kDa protein must therefore be due to chemical 
modification during GnT I purification. The hydrophobic 
trans -membrane region can form an a— helix with ia hydrophobic 
surface capable of interacting with the membrane or with 
other hydrophobic proteins within the membrane. This strong 
hydrophobic interaction may explain why it is so difficult 
to purify glycosyltransf erase preparations with intact 
trans-membrane domains. 

Rabbit GnT I, human, mouse and bovine UDP-Gal : GlcNAc-R 
jSl, 4 -Gal-transf erases and human UDP-GallTAc:Fucal, 2Gal-R 
(GalNAc to Gal) al, 3-GalNAc-transf erase have an abnormally 
high number of Pro residues between the transmembrane domain 
and the catalytic domain, e.g., there are 13 Pro residues in 
GnT I between the transmembrane domain and baise position 376 
(Figure 4) ; 9 of these Pro residues occur in a short stretch 
of 21 amino acids (bases 314-376, Figure 4) . This Pro-rich 
"neck" inay play a role in positioning the catalytic domain 
in the lumen of the Golgi to enable glycosylation of 
glycoproteins moving along the Golgi loimen. 

The domain structure of GnT X appears to be similar to 
that of the previously cloned glycosyltransf erases . 
However, GnT I differs from these transferases in being ^ 
medial -Golgi enzyme, at least in some tissues (Dunphy et al 
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(1985) Cell, vol. 40, 463-472; Kornfeld et al (1985) Ann. 
Rev. Biochem. , vol. 54, 631-664). Although no medial-Golgi 
glycosyltransf erase has been cloned to date, rat liver 
Of-mannosidase II (also a medial-Golgi e.nzyme) has been 
partially cloned (Moremen (1989) Proc. Natl. Acad. Sci. usa , 
vol. 86(14), 5276-5280). Comparison with GnT I reveals a 
16-amino acid sequence in GnT I (LHYRPSAELFPIIVSQ, bases 
431-478, Figure 4) which shows a high similarity score to 
amino acid residues 403-418 in a-mannosidase II 
(LQYRNYEQLFSYMNSQ) . Paulson's group (Paulson et al (1989) 
J . Biol . Chem . , vol. 264, 17615-17618; Colley et al (1989) 
J. Biol. Chetn., vol. 264, 17619-17622) has suggested that 
the trans-Golgi retention signal lies in the amino-terminal 
57 amino acids of the o2 , 6-sialyl transferase molecule. The 
16-amino acid "consensus" sequence present in GnT I and 
a-mannosidase II may be the equivalent medial-Golgi 
retention signal. Joziasse et al (1989) J. Biol. Chem. . 
vol. 264, 14290-14297, have suggested that a column 
hexapeptide sequence K(R)DKKND(E) may serve as a UDP-Gal. 
binding site in the /31, 4-Gal- and ol, 3-Gal-transf erases ; 
this sequence is not present in GnT I. 

Sequence data indicate that the carboxy-terminal half 
of human GnT I shows 87% nucleotide sequence similarity and 
90% amino acid sequence similarity to the carboxy-terminal 
half of rabbit liver GnT I. Strong homology between species 
has also been observed for bovine, murine and human 
UDP-Gal : GlcNAc-R jSi, 4 -Gal-trans f erase (Appert et al (1986) 
Biochem. Biophys. Res. Commun- ^ vol. 139, 163-168; 
D'Agostaro et al (1989) Eur. J. Rinnh^^in , vol 183, 211-217; 
Masri et al (1988) Biochem. Biophvs. Pes. Commnn, ^ vol. 157, 
657-663; Narimatsu et al (1986) Proc. Nat. Acad. Sci. ttsa . 
vol. 83, 4720-4724; Shaper et al (1986) Proc. Nat. Ana,^. 
Sci. USA , -vol. 83, 1573-1577; Shaper et al (1988) J. Biol. 
fiheSU, vol. 263, 10420-10428; Nakazawa et al (1988) 
J. Biochem. (Tokyo) , vol. I04, 165-168) bovine and murine 
UDP-Gal:Gal-R ol, 3-Gal-transf erase (Joziasse et al (1989") 
J. Biol. Chem. , vol. 264, 14290-14297; Larsen et al (1989) 



SUBSTfTUTE SHEET 



BNSDOCID: <WO_9a09694AaLl.> 



wo 92/09694 



- 20 - 



PCT/CA91/00417 



Proc> Natl. Acad. Sci. USA , vol. 86, 8227-8231) , murine and 

human GDP-Fuc:Gal)Sl, 4 (3 ) GlcNAc-R (Fuc to GlcNAc) 

al, 3 (4) -Fuc-transferase (Gersten et al (1990) FASEB J, . vol. 

4, A193 0; Kukowska-Latallo et al (199 0) FASEB J,\ vol. 4, 

A1930) , and human and rat CMP-sialic acid:Gal-R 

a2, 6-sialyltransf erase (Lance et al (1989) Biochem . Biophys , 

Res. Commun. , vol. 164, 225-232). 

It has been reported (Kumar et al (1990) Mol, Cell 
Biol. , vol. 9, 5713-5717; Ripka et al (1989) Biochem. 
Biophys - Res , Commun . vol. 159(2), 554-560; Ripka et al 
(1990) J. Cellular Biochem. , vol. 42, 117-122) that 
transformation of Lee I Chinese hamster ovary (CHO) cell 
mutants (which lack GnT I) with a crude preparation of total 
human genomic DNA results in trans fectants expressing GnT I 
enzyme activity; this approach should allow cloning of the 
human GnT I gene by the gene transfer and expression 
screening method recently used to clone several 
glycosyl transferases (Larsen et al (1989) Proc. Natl. Acad. 
Sci. USA , vol. 86, 8227-8231; Larsen et al (1990) J. Biol. 
Chem. , vol. 265^ 7055-7061; Smith et al (1990) J. Biol. 
Chem. , vol. 265, 6225-6234; Gersten (1990) FASEB «J. , vol, 4, 
A1930; Kukowska-Latallo et al (1990) FASEB J . , vol. 4, 
A1930; Rajan et al (1989) J. Biol. Chem. , vol. 264(19), 
11158-11167; Ernst et al (1989) J. Biol. Chem. , vol. 264(6), 
3436-3447). 

other features of the invention will become apparent in 
the course of the following descriptions of exemplary 
embodiments which are given for illustration of the 
invention and are not intended to be limiting thereof. 

EXA^IPLES 

I. Rabbit r 

Preparation of Peptides. Rabbit liver GnT I was 
purified as previously described (Nishikawa et al (1988) 
J. Biol. Chem. . vol* 263, 8270-8281). Glycerol, Triton 
X-lod and salts were removed from the purified enzyme 
(approximately 15 /xg) by "inverse-gradient" reversed-phase 
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high performance liquid chromatography (RP-HPLC) (Simpson et 
al (1987) Eur. J. Biochem. , vol. 165, 21-29) . The enzyme 
solution (100 Ml) was diluted to 1.2 ml with n-propanol in a 
sample-loading syringe, thoroughly mixed, and loaded at 1 
ml/min on a VeloSep Cg cartridge (3-/im particle size, 30 x 
2.1 mm i.d.; Applied Biosystems, Foster City, CA, USA) 
previously equilibrated in 100% n-propanol at 40"C. GnT I 
was retained on the reversed-phase column under these 
conditions whereas glycerol, Triton X-100 and salts were 
washed through the column with 100% n-propanol. GnT I was 
eluted at 0.1 ml/min as a sharp peak by a linear gradient 
(5%/min) of decreasing n-propanol concentration (100% to 
50%) generated with 100% n-propanol and 50% n-propanol/50% 
water containing 0.4% (v/v) trif luoroacetic acid at 40 "C. 
GnT I-containing fractions from the inverse gradient RP-HPLC 
were pooled, adjusted to 0.02% (w/v) with respect to Tween 
20 (Pierce Chemical Co., Rockford, IL, USA), concentrated to 
100 jjLl in a 1.5-ml polypropylene tube using a centrifugal 
vacuum concentrator to reduce the n-propanol concentration, 
and diluted to 1.5 ml with 5% (v/v) formic acid containing 
0.02% Tween 20. 

Edman degradation of purified GnT I 200 pmol) 
yielded no N-terminal sequence indicating N-terminal 
blockage; proteolysis of GnT I was therefore undertaken. 
GnT I was digested with pepsin (Sigma) at an 
enzyme/substrate mass ratio of 1:20 for 1 h at 37 'C and the 
digest was. fractionated by RP-HPLC on a short microbore 
column (30 x 2.1 mm i.d.) employing a low pH 
(trif luoroacetic acid, pH 2.1) mobile phase and a gradient 
of acetonitrile to yield peptides 5 and 6 (Figure 1) . Core 
GnT I remaining after pepsin digestion was reduced with 
dithiothreitol and alkylated with iodoacetic acid (Simpson 
,et al (1988) Eur. J. Biochem. . vol.. 176, 187-197) to give 
core S-carboxymethylated(SCM)-GnT I which was purified by 
RP-HPLC (Simpson et al (1988) Eur. J. Biochem. , vol. 176, 
18T-197; Simpson et al (1989) Anal. Biochem. . vol. 177"; 
221-236) . Pepsin-treated core SCM-GnT I (about 10 fig in 
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1 ml 1% ammonium bicarbonate, ImM CaCl2, D.02% Tween 20) was 
digested with trypsin (Worthington) at an enzyme/sxibstrate 
mass ratio of 1:20 for 16 h at 37**C. RP-HPLC of the digest 
showed that trypsin resulted in little further digestion of 
the peps in- treated material. Sequence analysis of a portion 
of this material resulted in 3 3 amino acid assignments 
(peptide 1, Figure 1) . Pepsin and trypsin-treated core 
SCM-GnT I (about 8 /zg in 1 ml 1% ammonium bicarbonate-0 . 02% 
Tween 20) was digested with thermolysin (Sigma) at an 
enzyme/substrate mass ratio of 1;20 for 2 h at SO^C and the 
digest was fractionated by RP-HPLC to yield peptides 2, 3, 
4, 7 and 8 (Figure 1). Core GnT I was extremely resistant 
to proteolysis even after reduction and alkylation 
indicating that the molecule is probably very compact. 

HPIiC. RP-^HPIiC was carried out on a Hewlett-Packard 
liquid chromatograph (model 1090A) fitted with a diode array 
detector (model 1040A) (Simpson et al (1988) Eur. J, 
Biochem. . vol. 176, 187-197). A Brownlee RP-300 column 
(30-nm pore size, 7-ium diameter dimethyl octyl silica 
particles packed into a stainless steel cartridge, 3 0 x 2.1 
mm i.d,; Brownlee Laboratories, Santa Clara, CA, USA) was 
used for all peptide separations, 

Amino Acid Sequence Analysis. Automated amino acid 
sequence analysis of GnT I and derived peptides was 
performed v/ith Applied Biosystems sequencers (models 4 7 OA 
and 477A) equipped with on-line phenylthiohydantoin (PTH) 
amino acid analyzers (model 12 OA) . Polybrene (Klapper et al 

(1978) Anal. Biochem. . vol. 85, 126-131) was used as a 
carrier. 

Oligonucleotides and cDNA Synthesis. Oligonucleotides 
were synthesized on a Pharmacia automated oligonucleotide 
synthesizer at the Hospital for Sick Children-Pharmacia 
Biotechnology Service Centre. Total RNA was prepared from 
rabbit liver by the method of Chirgwin et al (Chirgwin et al 

(1979) Bibchemi strv . vol. 18, 5294-5299; Ausubel et al 
(19901 Current Protocols in Kolecular Biolocrv . Media, 
PAi Greene Publishing Associates and John Wiley and Sons). 
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Poly(A)+RNA was prepared by oligo(dt) chromatography (Aviv 
et al (1972) Proc. Na tl. Acad. Sci. USA . vol. 69, 1408-1412) 
using the inRNA Purification Kit supplied by Pharmacia. 
* Single-stranded cDNA synthesis was performed using the 

RiboClone cDNA Synthesis System (Promega) with the following 
modifications. Total rabbit liver RNA (20 ng) in a volume 
of 5.5 Hi was heated at 65 'C for 3 min followed by cooling 
on ice for 5 min. The following reagents were added to a 
final volume of 50 nl:50 mM Tris-HCl, pH 8.3; 0.15 M KCl; 
10 mM MgCl,; 2 mM dithiothreitol (DTT) ; each dNTP at 0,4 mM; 
40 units of RNasin (Promega) ; 2 mM sodium pyrophosphate; a 
mixture of the three anti-sense oligonucleotide primers 2A, 
3A and 6A (Figure 1) at concentrations of 50 nM each; 20 
units of AMV reverse transcriptase and 15 units of murine 
leukemia virus reverse transcriptase. Incubation was at 
42 'c for 2 hr. The reaction mixture was treated with NaOH 
(0.25 N final concentration) for 5 min at room temperature 
to destroy RNA. The solution was then heated at 65 "C for 
1 min followed by cooling on ice for 5 min and neutralized 
with HCl (0.25 N final concentration). This cDNA 
preparation was used directly in the PCR reaction. 

Amplification of cDWA. PCR was carried out in a total 
volume of 0.1 ml containing 50 mM KCl, lo mM Tris-HCl (pH 
8.3), 1.5 mM MgClj, 0.01% gelatin, each of the four dNTP at 
0.2 mM, 0.5 /xM of each oligonucleotide in six paired 
combinations of oligonucleotide primers (2S-3A, 2S-6A, 
3S-2A, 3S-6A, ,6S-2A, 6S-3A, Figure 1), 10 jil of RNA-free 
rabbit liver cDNA (see above), 2.5 units of Thermus 
aguaticus (Taq) polymerase (Perkin-Elmer/Cetus) and 0.1 ml 
of mineral oil. The samples were placed in an automated 
heating/cooling block (DNA Thermal Cycler, Perkin-Elmer) 
programmed for a temperature-step cycle of 94 'C (0.5 min), 
50'c (1 min) and 72'C (2 min) for a total of 40 cycles 
followed by a 10-minute extension at 72 'C after the final 
cycle. DNA from the PCR reactions was purified with 
GeneClean (Bio 101, Inc.) and analyzed by electrophoresis in 
a 1% agarose gel containing ethidium bromide (0.5 ^g/ml) . 
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Two PGR products (0.45 and 0.50 kb) were detected and were 
purified from a 1% agarose gel by GeneClean. The DNA ends 
were filled in with T4 DNA polymerase (Moremen (1989) Proc. 
Natl, Acad. Sci, USA, vol, 86(14), 5276-5280) and the blunt 
ends were ligated into Smal site of pGEM-72 (Promega) . The 
recombinant plasmid was amplified in £, coli XLl-blue cells 
and purified. The plasmid was used for sequencing and to 
prepare a labelled probe for screening of a cDNA library. 

Screening- of rabbit liver cDNA' library in AgtlQ, The 
recombinant plasmid containing pGEM-7z and 0.5 kb PGR 
product (see above) was cut with BamHl and used to generate 
a riboprobe (0.5 3d3) with the Promega Riboprobe Gemini II 
Core System. The reaction contained in a total volume of 
25 Ml: 32 mM Tris-HCl, pH 7.5; 5 mM MgCl^; 2 mM spermidine; 
8 mM sodium chloride; 8 mM DTT;. 40 units RNasin; 0.4 mM of 
each of ATP/ OTP and UTP; 5 /ilCa-"P]CTP (800 Gi/mmole) ; 

1 /ig of BamHl-cut pGEM-7z/PCR-product recombinant plasmid; 
and 2 units T7 RNA polymerase. Incubation was at 40' C for 

2 hr. RNase-free DNase I (10 units) was added followed by 
incubation at room temperature for 15 min. Buffer (80 jxl of 
50 mM Tris-HCl, pH 7 . 4 ; 4 mM EDTA; 3 00 mM NaCl ; 0.1% SDS) 
and tRNA (20 fig) were added followed by extraction with * 
phenol-chloroform-isoamyl alcohol (25:24:1, v/v) . The 
labelled RNA probe was desalted over a Sephadex G-50 column 
(Nick Column, Pharmacia) . 

A rabbit liver cDNA library in Agt 10 (5 '-stretch. Cat. 
No TL 1006a from Clohtech, EcoRI cloning site) was 
propagated in E. coli LE392 host cells and 10* plagues were 
screened by standard plaque hybridization techniques 
(Maniatis et al (1982) Molecular Cloning; a laboratory 
manual , Cold Spring Harbor, N.Y.rCold Spring Harbor 
Laboratory) using the above riboprobe. Following fixation 
of DNA to nitrocellulose membranes, the membranes were 
washed for 1 hr at 45*C in 50 mM Tris-HCl, pH 8.0/1 M NaCl/1 
mM EDTA/0.1% SDS. Membranes were prehybridized at 50 •C for 
2 hr in IM NaCl/50 mM sodium phosphate, pH 6.5/0.1% SDSy50% 
freshlydeionized formamide/1% glycine/0.5% Blotto/5 mM 
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EDTA/1% yeast total RNA. Riboprobe (5 x 10* cpm/ml 
hybridization solution) was added and hybridization was 
carried out at 50 'C overnight. Membranes were washed in 
2XSSC/0.1% SDS. twice for S ain at room temperature and twice 
for 15 Biin at 50'C. Positive isolates were identified by 
autoradiography and were plaque-purified. DNA was purified 
from phage lysates, digested with EcoRI, and cDNA inserts 
were analyzed by agarose gel electrophoresis. The largest 
CDNA insert obtained was 1.6 kb; it was subcloned into the 
EcoRI site of pGEM-7z (Promega) by standard methods 
(Maniatis et al (1982) Molecular cloning; a laboT-atnw 
manual . Cold Spring Harbor, N.Y.:Cold Spring Harbor 
Laboratory) and the recombinant plasmids were transfected 
into E. coli XLl-blue. Colonies containing the recombinant 
plasmid were selected and amplified, and plasmid DNA was 
purified by CsCl gradient centrifugation (Ausubel et al 
(1990) Current Protocol s in Molecular Binrnqy , Media, 
PA: Greene Publishing Associates and John Wiley and Sons) . 

The CDNA library was re-screened as described above 
using a 80 bp riboprobe prepared from the 5 • -end of the 
1-6 kb clone. The largest cDNA insert obtained was 3.0 kb. 
This insert was sub-cloned into pGEM-72 as described above 
and plasmid DNA was purified by Cscl gradient centrifugation 
(Ausubel et al (1990) Current Pr-otoc^oi s in MoleeulaY- 
Biology . Media, PA: Greene Publishing Associates and John 
Wiley and Sons) , to obtain pGEM-7z-rcgntl. 

DNA Secruencina. Two colonies of the pGEM-7z/ 
PCR-product recombinant plasmid (see above) containing 
inserts in opposite directions were sequenced directly by 
the single-strand dideoxynucleotide-chain-termination method 
(Sanger et al, Proc. Natl. Acad. Sci. URA , vol. 74, 
5463-5467) using deoxyadenosine 5 '-[a- [^^g j^j^^^j 
triphosphate, Sequenase (United States Biochemical) and the 
forward primer for pGEM-72. The 1,6 and 3.0 kb clones were 
sequenced by the Erase-a-Base System (Promega) and the 
single-strand dideoxynucleotide-chain-termination metho-d. 
Both DNA strands were sequenced by using colonies in which 
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t:he inserts were present in opposite directions. Plasmid 
DNA (12 /xg) was cut with SphI to generate a 5 '-overhang and 
Xbal to generate a 3 '-overhang. The cut DNA was digested 
with exonuclease III (Erase-a-Base System, Promega) for 
varying lengths of time followed by Sl nuclease digestion. 
The DNA ends were blunt-ended with the Klenow fragment of 
E. coli DNA polymerase I and the DNA was circularized with 
T4 DNA ligase. The ligation mixtures were trans fected into 
competent XLl-blue cells. Miniplasmid preparations were 
carried out on about 5-10 subclones from each exonuclease 
III time point and were cut with BamHI and Aatll to 
determine DNA size. Colonies with appropriate deletions 
were amplified and incubated with M13K07 helper phage at 
37 "C for 1 hr followed by amplification in the presence of 
kanamycin (70 fig/ml) for 6 hr at 37 "C. Single-stranded DNA 
was produced by the helper phage and excreted into the 
medium. The ss-DNA was purified from the medixim by 
polyethylene glycol precipitation and sequenced by the 
dideoxynucleotide chain-termination method^ using 
deoxy adenosine 5 ' -[a- [^^S]thio] triphosphate, Sequenase 
(United States Biochemical) and the forward primer for 
pGEM-7z . 

RNA Hvbridization. Rabbit liver poly(A)+RNA (5 fig) was 
denatured in 50% (v/v) formamide/6% (v/v) formaldehyde 
buffer at 65**C and was resolved by gel electrophoresis in a 
1% agarose gel containing 6% (v/v) formaldehyde. The RNA 
was transferred to a nitrocellulose filter and the filters 
were hybridized with the ^^P-labelled 0.5 kb PGR riboprobe 
(see above] followed by autoradiography. The specific 
activity of the probe was about 10*^ dpm/ng and the 
hybridization solution contained about 10^ dpm/ml. 

In vitro transcription and translation. The 
recombinant plasmid containing pGEM-7z (Pi-omega) and the 
2.5 kb GnT I cDNA insert (rc2500. Figure 2) (pGEM-7z-rcgntl) 
was cut with Sph I to generate linear plasmid. RNA was 
transcribed using the SP6 RNA polymerase promoter and. 
initiation site present in pGEM-7z. RNA synthesis was 
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carried out at 40-C for l hr in a total volume of 50 m 
containing 40 mM Tris-HCl (pH 7.5), 6 mM MgCl,, 2 mM 
spermidine, 10 mM NaCl, lo mM DTT, 4 0 units RNasin 
(Promega), 0.5 mM of each of ATP, UTP and CTP, 0.1 mM GTP, 
0.5 mM m'G(5')PPP(5«)G (Pharmacia), 10 units SP6 RNA 
polymerase and 10 ng linearized plasmid. Control 
incubations were carried out in the absence of plasmid or 
with a linearized pGEM-72 recombinant plasmid containing a 
non-coding insert. The reaction mixture was extracted twice 
with phenol-chloroform-isoamyl alcohol (25:24:1, v/v) 
followed by precipitation with cold ethanol. 

Protein synthesis (translation) was carried out at 30 'C 
for 1 hr in a total volume of 50 m containing all 20 amino 
acids (1 mM each), 20 units of RNasin, RNA as prepared 
above, and buffer and rabbit reticulocyte lysate as supplied 
by Promega (Olliver et al (1984) "In vitro translation of 
messenger RNA in a rabbit reticulocyte lysate cell-free 
system", in: M. Walker J., ed . , Methods i n MoT ^r.,n ^ r- 
Biology, Nucleic Acids , Clifton, N. J. : Humana Press, 
145-155) . Non-radioactive amino acids were used when the 
products of translation were assayed for GnT I activity (see 
below). Separate incubations were carried out with 
l.-["S] -methionine (1000 ci/mmole; 90 /xci/ incubation) 
replacing non-radioactive Met; these incubations were 
analyzed by SDS-polyacrylamide gel electrophoresis followed 
by autoradiography, 

GnT I was assayed (Schachter (1989) Methods En7.vmni . , 
vol. 179, 351-396; Brockhausen et al (1988) Blochem. c^,n . 
Biol^, vol. 66, 1134-1151) in a total volume of 40 ^1 
containing 20 mM MnCl„ bovine serum albumin (i mg/ml) , 0.1% 
(v/v) Triton X-100, O.i M MES (pH 6.1), 0.5 mM UDP-N-[1- 
"C]acetyl-D-glucosamine (2.2 mCi/mmole) , 0.125 M GlcNAc and 
0.6 mM Manai-6(Manal-3)Man/3-hexyl (a kind gift from Dr. Hans 
Paulsen, University of Hamburg, Hamburg, Federal Republic of 
Germany).: Incubations were at 37 'C for 2 and 16 hr. The 
reaction was, stopped with 0.5 ml 20 mM sodium tetraborate/ 2 
mM EDTA .and was passed through- a small column of AG1X8 
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(Cl-form, 100-200 mesh, equilibrated with water) to remove 
radioactive nucleotide-sugar . The eluate was applied to a 
Sep-Pak C-18 reverse phase cartridge (Waters) conditioned 
with . 20 ml methanol and 20 ml water. The cartridge was 
washed with 20 ml water and radioactive product was eluted 
with 5.0 ml methanol (Palcic et al (19 88) Gl ycoconi uoua t e 
J. r vol. 5, 49-63) . An aliquot was counted directly and the 
remainder was analyzed by HPLC on a C-18 reverse phase 
column using acetonitrile-water (12:88) as the mobile phase 
(Schachter et al (1989) Methods Enzvmol. , vol. 179, 351-396; 
Brockhausen et al (1988) Biochem. Cell Biol, , vol. 66, 
1134-1151) . Product co-eluted with a standard preparation ' 
of Manal-6(GlcNAc^l-2Manal-3)Man/?-hexyl at 36 min. 

Preparation of DGEX"2t-rccmtl . This plasmid was 
prepared from pGEM-7z-rcgntl by cutting out the insert 
rcgntl with Eco RI. Plasmid pGEX-2t (Pharmacia) was 
linearized with Eco RI and the insert was ligated into the 
plasmid by standard procedures. The recombinant plasmid was 
amplified in E. coli in the presence of ampicillin and 
purified by cesium chloride centrifugation. 

Amplification of cDNA- Three amino acid sequences 
(Figure 1) were chosen for the design of sense and 
anti-sense oligonucleotide primers to be used in the PGR 
amplification of rabbit liver cDNA. Deoxyinosine was 
substituted in positions where codon degeneracy was >2 
(Moremen (1989) Proc. Natl. Acad. Sci. USA , vol. 85(14) , 
5276-5280) ; mixed pairs of bases were used in four positiohs 
in all three sequences giving a 16-fold mixture of sequences 
for every primer. Since we had no knowledge of the order of 
the peptides in the amino acid sequence, PGR was carried out 
with all six possible combinations of sense and anti-sense 
primers (2S-3A^ 2S-6A, 3S-2A, 3S-6A, 6S-2A, 6S-3A, 
Figure 1) . The products of the PGR reactions were analyzed 
by agarose gel electrophoresis (Figure 3). Primer-dependent 
, products were obtained with two of the six incubations, 
i.e., "2S-6A (500 bp) and 3S-6A (450 bp). The complete ^ 
nucleotide sequence for GnT I is shown in Figure 4 . 
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Oligonucleotide primers 2S and 3A are separated by only nine 
bases thereby explaining the absence of PCR product with 
this combination. 

Sequence Analysts. The 1.6 kb clone contains 0.5 kb 
from the 3 '-end of the coding region and the full l.l Jcb 
3 ' -untranslated region (rclSOO, Figure 2). The 3.0 kb clone 
yielded a 2485, bp sequence (rc2500. Figure 2; Figure 4). We 
have shown that subcloning of the 3.0 kb DNA fragment in 
PGEM-7Z results in deletion of a 0.5 kb DNA fragment near 
the 5 '-end of the clone.' Comparison of the cDNA sequence 
shown in Figure 4 with the sequence of human genomic DNA for 
GnT I (in preparation) has shown that this deleted 0.5 kb 
DNA fragment is not part of the GnT I gene; we do not know 
the origin of this DNA. 

The GnT I coding sequence has 13 41 bp and codes for a 
membrane-bound protein of 447 amino acids (M^52,000)., There 
is a single hydrophobic domain (bases 62 to 136) flanked by 
charged amino acids (Figure 4). Chou-Fasman rules (Chou et 
al (1978) Adv. Enzymol . , vol. 47, 45-147) predict that this 
hydrophobic segment is capable of propagating an a-helix, as 
expected for a transmembrane domain. 

The presximptive initiation Met codon is at the ATG ' 
codon at position 50 which has an A at position 47 thereby 
fulfilling the requirements for an initiation codon (Kozak 
(1983) MicrobiolocTical Reviews, vol. 47, 1-45). All eight 
peptides shown in Figure 1 (a total of 103 amino acid 
residues) can be identified in the sequence (Figure 4); an 
additional five tentative assignments also match the 
sequence. GnT I purified from rabbit liver has a molecular 
weight of about 45 kDa (Nishikawa et al (1988) j. Biol. 
CheriK, vol. 263, 8270-8281). The protein" has no N-glycans 
since none of the nine Asn residues are in a typical 
Asn-X-Ser(Thr) sequence; we have previously shown that 
rabbit liver GnT I binds poorly to lectin/ agarose columns 
(Nishikawa et al (1988) J. Biol, rht^m: ^ vol. 263, 
827a^8281) . If there are no or few 0-glycans, a - 
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catalytically active protein of 45 kDa can be derived by 
cleavage at about base position 215 (Figure 4) • 

Comparison of the GnT I sequence with those of several 
previously cloned glycosyltransf erases (Appert et al (1986) 
Biochein> Biophvs> Res, Commun, , vol. 139, 163-168; 
D'Agostaro et al (1989) Eur. J. Biochem> . vol. 183, 211-217; 
Hollis et al (1989) Biochem. Biophvs. Res. Cominun, . vol* 
162, 1069-1075; Joziasse et al (1989) J, Biol. Chem. , vol. 
264, 14290-14297; Larsen et al (1989) Proc. Natl. Acad. Sci, 
USA , vol. 86, 8227-8231; Larsen et al (1990) J. Biol . Chem. . 
vol. 265, 7055-7061; Masibay et al (1989) Proc. Natl. Acad. 
Sci. USA , vol. 86, 5733-5737; Masri et al (1988) Biochem. 
Biophvs . Res ■ Cominun . , vol. 157, 657-663; Narimatsu et al 
(1986) Proc. Nat. Acad. Sci. USA , vol. 83, 4720-4724; Russo 
et al (1990) J. Biol. Chem. . vol. 265, 3324-3331; Shaper et 
al (1986) Proc. Nat. Acad. Sci. USA , vol. 83, 1573-1577; 
Shaper et al (1988) J. Biol. Chem, . vol. 263, 10420-10428; 
Shaper et al (1988) Biochemie. . vol. 70, 1683-1688; Shaper 
et al (1990) Proc. Natl. Acad. Sci. USA , vol. 87, 791-795; 
Smith et al (1990) J. Biol. Chem. . vol. 265, 6225-6234; 
Weinstein et al (1987) J. Biol. Chem. . vol. 262, 
1773 5-1774 3) revealed no sequence homology but GnT I appears 
to have a domain structure typical of these enzymes (Paulson 
et al (1989) J. Biol. Chem. . vol. 264, 17615-17618). 
Searches of the GenBank nucleotide data base (release 62.0) 
with the coding region of GnT I and of the PIR Protein Data 
Base (release 23.0) with the GnT I amino acid sequence 
revealed no significant similarities to other sequences. 

The complete sequence has a long 3 ■ -xin translated region 
(bases 1391-2479) containing the consensus polyadenylation 
signal l^TAAA at position 2435 (Tosi et al (1981) Nucleic 
Acids Research ,, vol. 9, 2313-2323). Long 3 ' -untranslated 
regions are typical of the known glycosyltransf eraise genes 
and may be a feature present in other Golgi-localized 
enzymes (Moremen (1989) Proc. Natl. Acad. Sci. USA , vol. > 
86(14"), 5276-5280). 
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. Northern Blot Analysis. The PGR riboprobe was used to 
determine the size of mRNA in rabbit liver. A major band 
was detected at about 3.0 kb with some smearing at lower 
molecular weights (data not shown) indicating that the 2 . 5 
kb cDNA clone (Figure 4) may not be full-length. 

In Vitro transcription and translation. Transcription 
of the linearized pGEM-7z/2.5 kb GnT I cDNA recombinant 
plasmid (pGEM-7z-rcgntl) followed by translation in the 
presence of L-["S]Met resulted in the appearance of a 
strong radioactive 52 kDa band on SDS-polyacrylamide gel 
electrophoresis; this band was not seen in control 
incubations lacking plasmid or containing control plasmid 
(Figure 5) . The molecular weight matches the prediction for 
the open reading frame shown in Figure 4. Table 1 shows the 
results of GnT I assays carried out on the 
transcription-translation incubations. The incubation 
containing the pGEM-72/2.5 kb GnT I cDNA recombinant plasmid 
(pGEM-7z-rcgntl) has appreciable GnT I activity whereas both 
controls show low activity. it is concluded that the 2.5 kb 
sequence shown in Figure 4 can code for the synthesis of 
catalytically active GnT I. 



TABLE 1 

In vitro transcription-translation of rabbit GnT I cDNA 



Conditions of transcription 


GnT I product 
(nmoles/total transcription incubation) 




Sep-Pak assays 
2 hr 16 hr 


HPLC assays 
16 hr 


Mo plasmid 


0.04 0.21 




Control Plasmid 


0.04 0.21 


0.29 


2.5 kb GnT I cDNA 
(pGEM-7z-rcgntl) 


0.41 1.05 


1.32 
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IT. Human GnT I: 

The polymerase chain reaction (PGR) was used to obtain 
a 0.5 Jcb ds-cDNA representing the carboxy terminal half of 
the rabbit liver GnT I coding sequence and labelled this DNA 
fragment by the random primer tephnique. The preparation of 
this probe is described above. 

The rabbit cDNA probe was used to screen 10^ plaques 
from an amplified human genomic DNA library in AEMBL3 
prepared from chromosomal DNA from chronic myeloid leukemia 
cells. Positive plaques (23) were purified and phage DNA 
was subjected to restriction enzyme analysis using the 0.5 
kb rabbit cDNA as probe. All 2 3 preparations gave the same 
SauSA 0.4 kb fragment. This fragment showed 87% base 
similarity and 90% amino acid sequence similarity to the 
rabbit GnT I carboxy-terminal sequence. Inserts of 13 and 
15 kb were cut from two of the human genomic DNA clones with 
SAII and subcloned into plasmid pGEM-5zf (+) (Promega) . 
Restriction maps of the two inserts show that they represent 
an over-lapping 18 kb DNA sequence. 

The coding sequence was located in a 4.0 kb fragment of 
human genomic DNA by screening restriction maps with a probe 
containing the entire coding region of the rabbit GnT I 
cDNA. This 4.0 kb DNA fragment was cut out by restriction 
enzymes and subcloned into the sequencing vector pGEM-5zf (+) 
to yield pGEM-Sz-hggntl and sequenced. Trans fection of the 
gene into Lec 1 Chinese hamster ovary cell mutants (which 
lack GnT I activity) results in the expression of GnT I 
activity indicating the presence of a functional promoter 
5 upstream of the transcription start site. 

The 4 kb sequence contains an open reading frame coding 
for a protein with 445 amino acids (2 less than the rabbit 
enzyme) . The DNA contains a functional promoter and an 
intronless gene. The similarity between the rabbit and 
human enzymes is 85% for the nucleotide coding sequences and 
over 90% for the amino acid sequences. 

Obviously, numerous modifications and variations of the 
present invention are possible in light of the above 
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teachings. It is therefore to be understood that, within 
the scope of the appended claims, the invention may be 
practiced otherwise than as specifically described herein. 
The references cited in the specification are incorporated 
herein by reference. 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OR PRIVIIiEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 

1 • An isolat:ed DNA sequence encoding a protein having the 

amino acid sequence of formula I: 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 
LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 
ARG PRO VAL PRO SER ARG LEU PRO SER ASP ASN ALA LEU ASP ASP 
ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 
ALA GLU VAL GLU LEU GLU ARG GLN ARG GLY LEU LEU GLN GLN ILE 
ARG GLU HIS HIS ALA LEU TRP SER GLN ARG TRP LYS VAL PRO THR 
ALA ALA . PRO PRO ALA GLN PRO HIS VAL PRO VAL THR PRO PRO PRO 
ALA VAL ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER TECR VAL 
ARG ARG CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU 
LEU PHE PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR 
ALA GLN VAL ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG 
GLN PRO ASP LEU SER ASN ILE ALA VAL GLN PRO ASP HIS ARG LYS 
PHE GLN GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG -TRP ALA LEU 
GLY GLN ILE PHE HIS ASN PHE ASN TYR PRO ALA ALA VAL VAL VAL 
GLU ASP ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE GLN 
ALA THR TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL 
SER ALA TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP SER SER 
LYS PRO GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY 
TRP LEU LEU LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP 
PRO LYS ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG 
LYS GLY ARG ALA CYS VAL ARG PRO GLU ILE SER ARG THR MET THR 
PHE GLY ARG LYS GLY VAL SER HIS GLY GLN PHE PHE ASP GLN HIS 
LEU LYS PHE ILE LYS LEU ASN GLN GLN PHE VAL PRO PHE THR GLN 
LEU ASP LEU SER TYR LEU GLN GLN GLU ALA TYR ASP ARG ASP PHE 
LEU ALA ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL 
ARG THR ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO 
PRO GLN THR TRP ASP GLY TYR ASP PRO SER TRP THR. 
2, The DNA sequence of Claim 1, having the nucleotide 
sequence of formula III: 

atg ctg aag aag cag tct get ggg ctt gtg ctg tgg ggt get ate 
etc ttt gtg gcc tgg aat gee ctg ctg etc etc ttc ttc tgg aca 
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cgt cca gtg cct age agg ctg ccg tea gac aat get etc gat gat 
gae cet gee age etc ace cgt gag gtg ate cgc tta get cag gat 
gee gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag att 
agg gag cac eat get ctt tgg age cag egg tgg aag gtg cet act 
gea gee cct cct get cag ccg cat gtg cct gtg ace cca ccg cca 
get gtg ate ccc ate ctg gta att gee tgt gac cgc age aec gte 
cgc cgc tgt ttg gae aag eta ctg cat tat egg cct tea get gag 
ctg ttc ccc ate att gtc age cag gac tgt ggg cat gag gag aca 
gcq cag gtc att get tec tat ggc age gea gtc aca cac ate egg 
caa cet gac ctg age aac att get gtg cag ccc gac cac cgc aag 
ttc cag ggc tac tac aag ate gea egg cat tac cgc tgg gea ttg 
ggc caa ate ttc cac aat ttc aac tac cca gea get gtg gtg gtg 
gag gat gat etc gag gtg gea cca gac ttc ttt gag tac ttc cag 
gee act tac cca ctg ttg aaa gea gac ccc tec etc tgg tgt gtg 
tet gee tgg aat gac aat ggc aaa gaa cag atg gta gae teg agt 
aag cca gag tta etc tac cgc aca gat ttc ttt cct ggc tta ggc 
tgg tta ctg ttg get gaa etc tgg get gaa ctg gag ccc aag tgg 
ccc aaa gee ttc tgg gat gac tgg atg cgc egg cct gag cag cga 
aag ggg agg gee tgt gtg cgt cca gaa ate tea. aga aca atg aca 
ttt ggc egg aag ggt gtg age eat ggg cag ttc ttt gae cag cat 
etc aag ttc ate aag ctg aac cag cag ttt gta ccc ttc acq cag - 
ctg gac ctg teg tac ctt cag cag gag gee tat gac egg gat ttc 
ctt get cgt gtt tat ggt get ccc cag tta cag gtg gag aaa gtg 
agg ace aat gac egg aag gag eta gga gag gtg cgc gta cag tac 
aca ggc agg gac age ttc aag get ttc gee aag gee ctg ggt gtc 
atg gat gac etc aaa tea ggt gta ccc agg get gga tac egg ggc 
att gtc aec ttc tta ttc egg ggc cgc cgt gtc cac ctg gcg ccc 
cct cag act tgg gat ggc tat gat cct agt tgg act. 
3. The DNA sequence of Claim 1, having the nucleotide 
sequence of formula IV: 

gaattccggc aagtcatacc tttgcctgcc ctcccctgtg ggggccagg 
atg ctg aag aag cag tct get ggg ctt gtg ctg tgg ggt get ate 
etc ttt gtg gee tgg aat gee ctg ctg etc etc ttc ttc tgg aca 
cgt cca gtg cct age agg ctg ccg tea gac aat get etc gat gat 
gac cet gee age etc aec cgt gag gtg ate cgc tta get cag gat 
gce*gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag* att 
agg gag cac cat get ctt tgg age cag egg tgg aag gtg cct act 
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gca gcc cct cct get cag ccg cat gtg cct gtg acc cca ccg cca 

get gtg ate cce ate ctg gta att gee tgt gac cgc age ace gtc 

ege cgc tgt ttg gac aag eta etg eat tat egg cet tea get gag 

etg ttc cec ate att gtc age cag gac tgt ggg eat gag gag aca 

gee cag gtc att get tec . tat ggc age gca gtc aca cac ate egg 

caa cet gac etg age aac att get gtg cag cec gac cac cgc aag 

. ttc cag ggc tac tac aag ate gca egg cat tac cgc tgg gca ttg 

ggc caa ate ttc cac aat ttc aac tac cca gca get gtg gtg gtg 

gaa gat gat etc gag gtg gca cca gac ttc ttt gag tac ttc cag 

gcc act tac cca ctg ttg aaa gca gac cec tee etc tgg tgt gtg 

tet gcc tgg aat gac aat ggc aaa gaa cag atg gta gac teg agt 

aag cca gag tta etc tac cgc aca gat ttc ttt cct ggc tta ggc 

tgg tta ctg ttg get gaa etc tgg get gaa ctg gag cce aag tgg 

cce aaa gcc ttc tgg gat gac tgg atg cgc egg cct gag cag ega 

aag ggg agg gcc tgt gtg cgt cca gaa ate tea aga aca atg aca 

ttt ggc egg aag ggt gtg age cat ggg cag ttc ttt gac cag cat 

etc aag ttc ate aag ctg aac cag cag ttt gta cec ttc acc cag 

ctg gac ctg teg tac ctt cag cag gag gee tat gac egg gat ttc 

ctt get cgt gtt tat ggt get cce cag tta cag gtg gag aaa gtg 

agg acc aat gac egg aag gag eta gga gag gtg cgc gta cag tac 

aca ggc agg gac age ttc aag get ttc gcc aag gee ctg< ggt gtc 

atg gat gac etc aaa tea ggt gta cec agg get gga tac egg ggc 

att gtc acc ttc tta ttc egg ggc cgc cgt gtc cac etg gcg cce 

cct cag act tgg gat ggc tat gat cet agt tgg act 

taacagctcc tgcctgtccc ttctgggctc cttccttgca atttcatgat ctaagatggg 
accgtagtcc ctgggctgca ttgtcttttc tgtctttccc tcttgggtcc attttttttt 
tttrtcttttt tgagtggcat ttgaatacac agatgacaag gtgagggttc ttttgttaaa 
ggagttagat cagggaaagc attctgctgt ctgttgggta tcaagcagca aaccactgtg 
tgatagggga agaatgggct ttttrggggcc agaa:atatcc atgttctgag tttttctctt 
aggtcatctg cagaggagtt ggcaacttta gctttcttaa ccaggccttt tctttctgac 
ctigagagcca gggcatgaga cttcttgttc atgctccttt ttaccttccc ctaataaggg 
tctgggctac aggagaagtg aacatattgt ggccagaata atactaacca gaggggcctc 
attgtcagag tictaggtgca gttattgggt tgtcagagtt aatgccttct gttcttcttt 
ccttattcct gacttctgtc agctcttctt tctttgcagc ctagcaattt ttggttctaa 
gatgaaaaat gaagaggaaa agaaatatitc gcacccagct attgggagaa aggtagtggg 
aaaaaaacet cattgtacca ctrcaaagag acactcttga cctcttcctt tctaaaaatt 
agtcccctcc ctgttgcttc aggagaatgc tgtgctggtc agttctgtgt gatccttctt 
ccctgagttt tatacacagg ctcctcccta aggctgtggc ttctggtggc cctcctgaca 
raagttacag tggccaagac caggacaact ccggccatga gctaagtcct gcctaccttc 
tccaaaacat tcccatgtcc tcacaggcta ggatgcagat gttggttgga gaggaatrttg 
t^gtgtgtgtg tgtgtgtgtg tgtgttttct V tgcctgacct cagtttcatg gatgaaaagt - 
ggaagctaca gaattatttt caaaaataaa ggctgaattg tctgaaaaaa aaaaaaaaaa 
aaaaaaccgg aattc. 
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4. An isolated DNA sequence encoding a- protein having 
the amino acid sequence of formula II: 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 

LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 

ARG PRO ALA pro GLY ARG PRO PRO SER VAL SER ALA LEU ASP GLY 

ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 

ALA GLU VAL GLU LEU GLU ARG ARG ARG GLY LEU LEU GLN GLN ILE 

GLY ASP ALA LEU SER SER GLN ARG GLY ARG VAL PRO THR ALA ALA 

PRO PRO ALA GLN PRO ARG VAL PRO VAL THR PRO ALA PRO ALA VAL 

ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL ARG ARG 

CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU LEU PHE 

PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR ALA GLN 

ALA ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG GLN PRO 

ASP LEU SER SER ILE ALA VAL PRO PRO ASP HIS ARG LYS PHE GLN 

GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU GLY GLN 

VAL PHE ARG GLN PHE ARG PHE PRO ALA ALA VAL VAL VAL GLU ASP 

ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE ARG ALA THR 

TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL SER ALA 

TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP ALA SER ARG PRO 

GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY TRP LEU 

LEU .LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP PRO LYS 

ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG GLN GLY 

ARG ALA CYS ILE ARG PRO GU3 ILE SER ARG THR MET THR PHE GLY 

ARG LYS GLY VAL THR HIS GLY GLN PHE PHE ASP GLN HIS LEU LYS 

PHE ILE LYS LEU ASN GLN GLN PHE VAL HIS PHE THR GLN LEU ASP 

LEU SER TYR LEU GLN ARG GU3 ALA TYR ASP ARG ASP PHE LEU ALA 

ARG VAL TYR GLY ALA . PRO GLN LEU GLN VAL GLU LYS VAL ARG THR 

ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR THR GLY 

ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL MET ASP 

ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY ILE VAL 

THR PHE GLN PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO PRO PRO. 

5. The DNA sequence of Claim 4, having the nucleotide 

sequence of formula V: ' 

atgctgaa gaagcagtct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
ggaatgccct gctgctcqtc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt- 
cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 

SUBSTITUTE SHEET 

■> ^ 

: ' • ' \ ■ y : . 

BNSOOCID: <WO_0209694A^_I.> 



I 



wo 92/09694 PCr/CA91/00417 

- 38 - 



gccgcctgga caagctgctg cattatcggc cctcggctga gctcttcccc atrcatcgtta 
gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
cgcacatccg gcagcccgac ctgagcagca ttgcggtgcc gccggaccac cgcaagttcc 
agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
tccgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
agtactttcg ggccacctat: ccgctgctga aggccgaccc ctccctgtgg tgcgt:crcgg . 
cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
ggc^ggg<^ctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
cgcacgggca gttctttgac cagcacct:ca agtttatcaa gctrgaaccag cagtttgtgc 
acttcaccca gctggacctg tcttracctgc agcgggaggc ctatgaccga gatttcctcg 
cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggct 
atgatcctag ccggaat. 

6. The DNA sequence of Claim 4, having the nucleotide 
sequence of formula VI: 

aagttttgaa tgtttaagtt tatttaagtt tatttctaaa tattttctca tttctctggc 
ttttgtaagt agggttttct catccatgtt ttcttctcat gagttatttg tggatatgaa 
ggctatccat tagtatatgt tgatttttat attacacttc cttgctcagt tcattattga 
ttctttttga gttttccagg catattctca caagtaaaga taatagaaat agtttgcttc 
ctttccactt ctgctttgaa tttrtttrtttc ttggttcatt tgcattggct gcttcctcca 
gcaaaatgtt aaataaccct ggagatgatg ggcaactrtcg ttttgctcct gacattcgtg 
gggtgcctct ggtgcttccc tgttggtaag gggttaactg tagccctgag gtgggacatt 
tgattttaaa aatcagtcat cttggggcgc ttaggttaga ggaatggtag gcagatgctg 
tcactccttg cccctcccct cctcctrtccc acctggaggg gaaatgaaat ctgacaggta 
gaaagagggg agttrggggtt ctrttttctct ctccctccac cagcatcact ctctgcctct 
ccctcaaaaa tacgttcctg ggtcaggata tatgttgact ccctagagag ctctggagtc 
aaccrcctgg ccttcctcca ccctcactct tggccttttc ctgcccccat ttcctctacc 
tgtggggcat ggagccacga gcctttgtgt gacggtttgc tttctctctc ctgtctttag 
gtgcatggct gcctcctaat cccatagtcc agaggaggca tccctaggac tgcgggcaag 
ggagccgcaa gcccagggca gccttgaacc gtcccctggc ctgccctccg gtgggggcca 
ggatgctgaa gaagcagtrct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
ggaatgccct gctgctcctc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt 
cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 
gctgcctgga caagctgctg cattatcggc cctcggctga gctcttcccc atcatcgtta 
gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
cgcacatccg gcagcccgac ctgagcagca ttgcggtgcc gccggaccac cgcaagttcc, 
agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
ttcgcttccc cgcggccgtg gtggtjggagg atgacctgga ggtggccccg gacttcttcg 
agtactttcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg 
cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtgc ^ 
acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
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ctctgggtgt tatggatgac ctCaagtcgg gggttccgag agctggctac cggggtattg 
tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggcc 
atgatcctag ctggaattag cacctgcctg tccttcctgg gccccttctt gccacatcat 
gagctgaggt gaccacagtc cccaggctgc atcggcctgc ctgcgtttcc ctcttaggtg 
cattcatctt tttgattttt ccgagtggca tttaagtgca caaacgacaa caagaggatt 
attctcccgt tctcaaggga gtcagatcag gggaactact ctagggtatg ttgcggggta 
ttaagcagga aaacactgtg tggtgggggg cactgggctt gttggggcca caaatgtcca 
cgtcctgagc tttctcctgg agcatgtgca gagagtttgg caacgttcgc tctcttgacc 
agaccccttc tccctgactg gctcttccag ccaggcacga gccctccttc tatacctgct 
ccccttccca gtggggactg agttatggga gaaggggaca tatttgtggc caaaatgata 
ctaaccaaag gggcttcctt gtcagggcct ggtggagttg gtgggtcatc ggggctcact " 
gcctcctgcc cttctctcct gtctgacccc cacttagccc ttctctcctt gcagcctagc 
agtttatagt tctgagatgg aaagttgaag ggggcaagca agacctctcc tcagcccatg 
cccagctgtc aggagagagg tgcagggagg aaggccttgt gctgggacaa cctctctctt 
gccttacctt cagagaggac tatgccctga cccctccttt ctgaaaatca gtgccctccc 
tgttgctcta ggaggctcct gctggcttgg tagaagacag aattcgatcc gcctgtccct 
ttttcccctg gggtttgaca cacaggctcc tctcagcatg aggtggagca gtgaccaggt 
ggagcagtga ccaggacgcc tctggcccag tgctgcccag cctccccgcc cgctcccagg 
cgccccatgt cctcacaggc caggacgcca tggcggccgg gagcatgcga. 

7. A plasmid, comprising a DNA sequence encoding a protein 
having the amino acid sequence of formula I: 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 
LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 
ARC PRO VAL PRO SER ARC LEU PRO SER ASP ASN ALA LEU ASP ASP 
ASP PRO ALA SER LEU THR ARC GLU VAL ILE ARG LEU ALA GLN ASP 
ALA GLU VAL GLU LEU GLU ARG GLN ARG GLY LEU LEU GLN GLN ILE 
ARG GLU HIS HIS ALA LEU TRP SER GLN ARG TRP LYS VAL PRO THR 
ALA ALA PRO PRO ALA GLN PRO HIS VAL PRO VAL THR PRO PRO PRO 
ALA VAL ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL 
ARG ARG CYS LEU ASP . LYS LEU LEU HIS TYR ARG PRO SER ALA GLU 
LEU PHE PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR 
ALA GLN VAL ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG 
GLN PRO . ASP LEU SER ASN ILE ALA VAL GLN PRO ASP . HIS ARG LYS 
PHE GliN GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU 
GLY GLN ILE PHE HIS ASN PHE ASN TYR PRO ALA ALA VAL VAL VAL 
GLU ASP ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE GLN 
ALA THR TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL 
SER ALA TRP ASN ASP ASN GLY LYS GLU GLN. MET VAL ASP SER SER 
LYS PRO GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY 
TRP LEU LEU LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP 
PRO LYS ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG 
LYS -GLY ARG ALA CYS VAL ARG PRO GLU ILE SER ARG THR MET" THR 
PHE GLY ARG LYS GLY VAL SER HIS GLY GLN PHE PHE ASP GLN HIS 
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LEU LYS PHE ILE LYS LEU ASN GLN GLN PHE VAL PRO PHE THR GLN 
LEU ASP LEU SER TYR LEU GIM GLN GLU ALA TYR ASP ARG ASP PHE 
LEU ALA ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL 
ARG THR ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO 
PRO GLN THR TRP ASP GLY TYR ASP PRO SER TRP THR. 
8. The plasmid of Claim 7, wherein said DNA sequence has 
the formula III: 

atg ctg aag aag cag tct got ggg ctt gtg ctg tgg ggt get ate 
etc ttt gtg gee tgg aat gee etg ctg etc etc ttc ttc tgg aca 
cgt eca gtg cct age agg ctg ceg tea gac aat get etc gat gat 
gae cct gee age etc ace cgt gag gtg ate ege tta get cag gat 
gee gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag att 
agg gag cac cat get ctt tgg age cag egg tgg aag gtg cct act 
gca gee cct cct get cag ceg cat gtg cct gtg ace eca ceg cca 
get gtg ate ccc ate ctg gta att gee tgt gac ege age ace gtc 
ege cgc tgt ttg gac aag eta ctg cat tat egg cct tea get gag 
ctg ttc ccc ate att gtc age cag gac tgt ggg cat gag gag aca. 
gcc cag gtc att get tec tat ggc age gca gtc aca cac ate egg 
caa cct gac ctg age aac att get gtg cag ccc gae cac cgc aag 
ttc cag ggc tac tac aag ate gca egg cat tac cgc tgg gca ttg 
ggc caa ate ttc cac aat ttc aac tac cca gca get gtg gtg gtg 
gag gat gat etc gag gtg gca cca gae ttc ttt gag tac ttc cag 
gcc act tac cca ctg ttg aaa gca gac ccc tec etc tgg tgt gtg 
tet gee tgg aat gac aat ggc aaa gaa cag atg gta gac teg agt 
aag cca gag tta etc tac cgc aca gat ttc ttt cct ggc tta ggc 
tgg tta ctg ttg get gaa etc tgg get gaa ctg gag ccc aag tgg 
ccc aaa gcc ttc tgg gat gac tgg atg cgc egg cct gag cag ega 
aag ggg agg gee tgt gtg cgt cca gaa ate tea aga aca atg aca 
ttt ggc egg aag ggt gtg age eat ggg cag ttc ttt gac cag cat 
etc aag ttc ate aa.g ctg aac cag cag ttt gta ccc ttc acc cag 
ctg gac ctg teg tac ctt cag cag gag gcc tat gac egg gat ttc 
ctt get cgt gtt tat ggt get ccc cag tta cag gtg gag aaa gtg 
agg acc aat gac egg aag gag eta gga gag gtg cgc gta cag tac 
aca ggc agg gac age ttc aag get ttc gcc . aag gcc ctg ggt gtc 
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atg gat gac etc aaa tea ggt gta ccc agg get gga tac egg gge 
att gte ace ttc tta ttc egg gge cgc egt gte cae ctg gcg cce 
eet cag act tgg gat gge tat gat eet agt tgg act, 
9. The plasmid of Claim 7, wherein said DNA sequence has 
the formula IV: 

gaatteeggc aagtcatacc tttgcctgcc etcccctgtg ggggceagg 
atg ctg aag aag cag tct getggg ctt gtg ctg tgg ggt get ate 
etc ttt gtg gee tgg aat gcc ctg ctg etc etc ttc ttc tgg aca 
cgt cea gtg ect age agg ctg ccg tea gac aat get etc gat gat 
gac ect gcc age etc ace cgt gag gtg ate cgc tta get cag gat 
gee gag gta gag ttg gaa egt cag egg gga ctg ttg cag cag att 
agg gag cae eat get ctt tgg age cag egg tgg aag gtg ect act 
gca gcc ect ect get cag ccg eat gtg ect gtg ace cea ccg cea 
get gtg ate ccc ate ctg gta att gee tgt gac cgc age ace gte 
cgc cgc tgt ttg gac aag eta ctg cat tat egg ect tea get gag 
ctg ttc ccc ate att gte age caig gac tgt ggg cat gag gag aca 
gcc cag gte att get tec tat gge age gca gtq aca cae ate egg 
caa ect gac ctg age aac att get gtg cag ccc gac cac cgc aag 
ttc cag gge tac tac aag ate gca egg eat tac cgc tgg gca ttg 
gge caa ate ttc cae aat ttc aac tac cea gca get gtg gtg gtg 
gaa gat gat etc gag gtg gca cea gac ttc ttt gag tac ttc cag 
gee act tac cea ctg ttg aaa gca gac ccc tee etc tgg tgt gtg 
tct gee tgg aat gac aat gge aaa gaa cag atg gta gac teg agt 
aag cea gag tta etc tac cgc aca gat ttc ttt eet gge tta gge 
tgg tta ctg ttg get gaa etc tgg get gaa ctg gag ccc aag tgg 
ccc aaa gee ttc tgg gat gac tgg atg cgc egg ect gag cag ega 
aag ggg agg gcc tgt gtg cgt cea gaa ate tea aga aca atg aca 
ttt gge egg aag ggt gtg age cat ggg cag ttc ttt gac cag cat 
etc aag ttc ate aag ctg aac cag cag ttt gta ccc ttc ace cag 
ctg gac ctg teg tac ctt cag cag gag gcc tat gac egg gat ttc 
ctt get cgt gtt tat ggt get ccc cag tta cag gtg gag aaa gtg 
agg ace aat gac egg aag gag eta gga gag gtg cgc gta cag tac 
aca gge agg gac age ttc aag get ttc gcc aag gcc ctg ggt gte 
atg gat gac etc aaa tea ggt gta ccc agg get gga tac egg gge 
att gte acc ttc tta ttc egg gge cgc cgt gte cae ctg gcg cce 
ect -cag act tgg gat gge tat gat ect agt tgg act 
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taacagctcc tgcctgtccc ttctgggctc cttccttgca atttcatgat ccaagatggg 
accgtagtcc ctgggctgca ttgtcttttc tgtctttccc tcttgggtcc attttttttt 
ttttcttttt tgagtggcat ttgaatacac agatgacaag gtgagggttc ttttgttaaa 
ggagttagat cagggaaagc attctgctgt ctgttgggta tcaagcagca aaccactgtg 
tgatagggga agaatgggct ttttggggcc agaaatatcc atgtrctgag tttttctctt 
aggtcatctg cagaggagtt ggcaacttta gctttcttaa ccaggccttt tctttctgac 
ctgagagcca gggcatgaga cttcttgttc atgctccttt ttaccttccc ctaataaggg 
tctgggctac aggagaagtg aacatattgt ggccagaata atactaacca gaggggcctc 
attgtcagag tctaggtgca gttattgggt tgtcagagtt aatgccttct gttcttcttt 
ccttattcct gacttctgtc agcrcttctt tctttgcagc ctagcaattt ttggttctaa 
gatgaaaaat gaagaggaaa agaaatattc gcacccagct attgggagaa aggtagtggg 
aaaaaaactt cattgtacca cttcaaagag acactcttga cctcttcctt trctaaaaatc 
agtcccctcc ctgttgcttc aggagaatgc tgtgctggtc agttctgtgt gatccttctt 
ccctgagttt tatacacagg ctcctcccta aggctgtggc ttctggtggc cctcctgaca 
taagttacag tggccaagac caggacaact ccggccatga gctaagtcct gcctaccttc 
cccaaaacat tcccatgtcc tcacaggcta ggatgcagat gttggttgga gaggaatttg 
rgtgtgtgtg tgtgrgtgtg tgtgttttct tgcctgacct cagtttcatg gatgaaaagt 
ggaagccaca gaattatttt caaaaaCaaa ggctgaattg tctgaaaaaa aaaaaaaaaa 
aaaaaaccgg aattc. 

10. A plasmid, comprising a DNA sequence encoding a protein, 
having the amino acid sequence. of formula II: 

MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALft ILE 
LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 
ARC PRO ALA PRO GLY ARG PRO PRO SER VAL SER ALA. LEU ASP GLY • 
ASP PRO ALA SER LEU TfDR ARG GLU VAL ILE ARG LEU ALA GLN ASP 
ALA GLU VAL GLU LEU GLU ARG ARG ARG GLY LEU LEU GLN GLN ILE 
GLY ASP ALA LEU SER SER GLN ARG GLY ARG VAL PRO THR ALA ALA 
PRO PRO ALA GLN PRO ARG VAL PRO VAL THR PRO ALA PRO ALA VAL 
ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL ARG ARG 
CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU LEU PHE 
PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR ALA GLN 
ALA ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG GLN PRO 
ASP LEU SER SER ILE ALA VAL PRO PRO ASP HIS ARG LYS PHE GLN 
GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP AIA LEU GLY GLN 
VAL PHE ARG GLN PHE ARG PHE PRO ALA ALA VAL VAL VAL GLU ASP 
ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE ARG ALA THR 
TYR PRO LEU LEU LYS ALA ASP PI^O SER LEU TRP CYS VAL SER ALA 
TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP ALA SER ARG PRO 
GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY TRP LEU 
LEU LEU AIA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP PRO LYS 
ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG GLN GLY 
ARG ALA CYS ILE ARG PRO GLU ILE SER ARG THR MET THR PHE GLY 
ARG LYS GLY VAL THR HIS GLY GLN PHE PHE ASP GLN HIS LEU LYS 
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atgctgaa gaagcagtct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
ggaatgccct gctgctcctc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
tcagcgctct cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gacgccctgt 
cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
cccccgcgcc ggcggtgatt cccatcctgg tcatc'gcctg tgaccgcagc actgttcggc 
gctgcctgga caagctgctg cattatcggc cctcggctga gctcttcccc atcatcgtta 
gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
cgcacatccg gcagcccgac ctgagcagca ttgcggtgcc gccggaccac cgcaagttcc 
agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagi: 
trcgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
agtactttcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg. 
cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
agcccaagtg gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagg 
ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
cgcacgggca gttctttgac cagcacct:ca agtttatcaa gctgaaccag cagtttgtgc 
acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggct 
atgatcctag ctggaat. 

12. The plasmid of Claim 10, wherein said DNA sequence has 
the formula VI: . [ 

aagttttgaa tgtttaagtt tatttaagtt tatttctaaa tattttctca tttctctggc 
ttttgtaagt agggttttct catccatgtt ttcttctcat gagttatttg tggatatgaa 
ggctatccat tagtatatgt tgatttttat attacacttc cttgctcagt tcattattga 
ttctttttga gctttccagg catattctca caagtaaaga taatagaaat agcttgcttc 
ctttccactt ctgctttgaa tttttttttc ttggttcatt tgcattggct gcttcctcca 
gcaaaatgtt aaataaccct ggagatgatg ggcaacttcg ttttgctcct gacattcgtg 
gggtgcctct ggtgcttccc tgttggtaag gggttaactg tagccctgag gtgggacatt 
tgattttaaa aatcagtcat cttggggcgc ttaggttaga ggaatggtag gcagatgctg 
tcactccttg cccctcccct cctccttccc acctggaggg gaaatgaaat ctgacaggta 
gaaagagggg agttggggtt ctttttctct ctccctccac cagcatcact ctctgcctct 
ccctcaaaaa tacgttcctg ggtcaggata tatgttgact ccctagagag ctctggagtc 
aacctcctgg ccttcctcca ccctcactct tggccttttc ctgcccccat ttcctctacc 
tgtggggcat ggagccacga gcctttgtgt gacggtttgc tttctctctc ctgtctttag 
gtgcatggct gcctcctaat cccatagtcc agaggaggca tccctaggac tgcgggcaag 
ggagccgcaa gcccagggca gccttgaacc gtcccctggc ctgccctccg gtgggggcca 
ggatgctgaa gaagcagtct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
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ggaatgccct gctgctcctc ttcttctgga cgcgcccagc acctggcagg ccaccctcag 
tzcagcgctct: cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
acgccgaggt ggagctggag cgcaggcgtg ggctgctgca gcagatcggg gatgccctgt 
cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
cccccgcgcc ggcggtgatt cccatcctgg tcatcgcctg tgaccgcagc actgttcggc 
gctgcctgga caagctgctg cattatrcggc cctcggctga gctcttcccc atcatcgtta 
gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
cgcacatccg gcagcccgac ctgagcagca ttgcggtgcc gccggaccac cgcaagttcc 
agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
ttcgcttccc cgcggccgtg gtggtrggagg atgacctgga ggtggccccg gacttcttcg 
agtactttcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg 
cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
agcccaagtg gccaaaggcc titctgggacg actggatgcg gcggccggag cagcggcagg 
ggcgggcctg cata'cgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtgc 
acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
cccgcgtcta cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
tcaccttcca gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggct 
atgatcctag ctggaattag cacctgcctg tccttcctgg gccccttctt: gccacatcat 
gagctgaggt gaccacagtc cccaggctgc atcggcctgc ctgtgtttcc ctcttraggtg 
catttatctt tttgattttt ccgagtggca tttaagtgca caaatgataa caagaggatt ' 
attctcccgt tctcaaggga gtcagatcag gggaactatt ctagggtatg ttgcggggta 
ttaagcagga aaacactgtg tggtgggggg cactgggctt gttggggcca caaatgtcca 
cgtcctgagc trttctcctgg agcatgtgca gagagtttgg caacgttcgc tctcttgacc 
agaccccttc tccctgactg gctcttccag ccaggcacga gccctccttc tatacctgct 
ccccttccca gtggggactg agttatggga gaaggggaca tatttgtggc caaaatrgata 
ctaaccaaag gggcttcctt gtcagggcct ggtggagttg gtgggtcatc ggggctcact . 
gcctcctgcc cttctctcct gtctgacccc cacttagccc ttctctcctt gcagcctagc 
agtttratagt tctgagatgg aaagttgaag ggggcaagca agacctctcc tcagcccatg 
cccagctgtc aggagagagg tgcagggagg aaggccttgt gctgggacaa cctctctctt 
gccttacctt: cagagaggac tatgccctga cccctccttt ctgaaaatca gtgccctccc 
tgttgctcta ggaggctcct gctggcttgg tagaagacag aattcgatct gcctgtccct 
ttttcccctg gggtttgaca cacaggctcc tctcagcatg aggtggagca gtgaccaggt 
ggagcagtga ccaggac^cc tctggcccag tgctgcccag cctccccgcc cgctcccagg 
cgccccatgt cctcacaggc caggacgcca tggcggccgg gagcatgcga. 

13. A transformed cell^ containing a heterologous sequence 
of DNA encoding a protein having the amino acid sequence of 
formula I. 

14 • The transformed cell of Claim 13, wherein said 
heterologous DNA sequence has the formula III. 

15. The transformed cell of Claim 13, wherein said 
heterologous DNA sequence has the formula IV. 

16. A transformed cell, containing a heterologous sequence 
of DNA encoding a protein having the amino acid sequence of 
formula II. _ 

17. The transformed cell of Claim 16, wherein said 
ieterologous DNA sequence has the formula V. 
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18. The transformed cell of Claim 16, wherein said 
heterologous DNA sequence has the formula VI. 

19. A method for preparing a glycoprotein which is a 
complex or hybrid N-glycan, comprising: 

culturing a cell which produces a precursor 
high-ihannose glycoprotein and which contains a heterologous 
DNA sequence which encodes a protein having the amino acid 
sequence of . formula I. 

20. The method of Claim 19, wherein said heterologous DNA 
sequence has the formula III. 

21. The method of Claim 19, wherein said heterologous DNA 
sequence has the formula IV. 

22. A method for preparing a glycoprotein which is a 
complex or hybrid N-glycan, comprising: 

culturing a cell, which produces a precursor 
high-mannose glycopro1:ein and which contains a heterologous 
DNA sequence which encodes a protein having the amino acid 
sequence of formula II. 

23. The method of Claim 22, wherein said heterologous DNA 
sequence has the formula V. 

24. The method of Claim 23, wherein said heterologous DNA 
sequence has the formula VI. 



SUBSTITUTE SHEET 

BNSOOCIO: <WO_8209eS4A^_l_> 



wo 92/09694 



PCT/CA91/00417 



1/10 

Peptide 1: 

1 10 20 30 

WALGQIFHNFNYPAAVVVEDDLEVAPDFFEYfq 

Peptide 2: 

1 io 

L WAELEPKU P K a 
Peptide 3: 

1 10 
F W D D W M R R P' E Q 

Peptide 4: 
1 

T D F F P e 
Peptide 5: 

1 10 
DLSYLQQEAYDRDFl 

Peptide 6: 

1 10 20 

L F KG RRVHLAPP OTffDGYDP S W t 

Peptide 7: 
1 

L G W L 

Peptide 8: 
1 

A T Y P L 
Oligonucleotides ! 

2S: 5'-TGG GCI GAA CTI GAA CCI AAA TGG-3 ' 

G T G G 

2A: 5 '-CCA TTT IGG TTC lAG TTC IGG CCA-3* 
C C A C 

3S: 5' -TTT TGG GAT GAT TGG ATG CG-3' 
C C C A 

3A: 5'-CG CAT CCA ATC ATC CCA AAA-3 ' 
TGG G 

6S: 5'-.CAA ACI TGG GAT GGI TAT GAT CC-3' 
G C c C 

6A: 5'-GG ATC ATA ICC ATC CCA IGT TTG-3 • 

. <3 \G G C 
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G) 

•D 

C3 



CO C/) CO CO CO (/D 

C\J CD CO CD CO C\J 

< < < < < < 

CO CO CM CM CD CO 



CL 

t 



C 

o 
o 






500bp 
450bp 




wo 92/09694 



PCr/CA91/00417 



50; 



95: 



A/10 



gaattccggc aagtcatacc tttgcctgcc ctcccctgtg ggggccagg 

atg ctg aag aag cag tct get ggg ctt gtg ctg tgg ggt get ate 
MET LEU LYS LYS GLN SER AIA GLY LEU VAL LEU TRP GLY AIA ILE 

etc ttt gtg gee tgg aat gee ctg ctg cte etc ttc ttc tgg aca 
LEU PHE VAL ALA TR P ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 

egt cea gtg eet age agg ctg ccg tea gac aat get etc gat gat 
ARC PRO VAL PRO SER ARG LEU PRO SER ASP ASN ALA LEU ASP ASP 

gac cct gcc age etc acc cgt gag gtg ate cgc tta get cag gat 
ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 

gcc gag gta gag ttg gaa cgt cag egg gga ctg ttg cag cag att 
ALA GLU VAL GLU LEU GLU ARG GLN ARG GLY. LEU LEU GLN GLN ILE 

275: agg gag eac cat get ctt tgg age cag egg tgg aag gtg cct act 
ARG GLU HIS HIS ALA LEU TRP SER GLN ARG TRP LYS VAL PRO THR 

320; 



140: 



185: 



230; 



365; 



gca gee cct cct get cag ccg cat gtg cct gtg acc cea ccg cca 
ALA ALA PRO PRO ALA GLN PRO HIS VAL PRO VAL THR PRO PRO PRO 

get gtg ate ccc ate ctg gta att gcc tgt gac cgc age acc gtc 
ALA VAL ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL 

410: cgc cgc tgt ttg gac aag eta ctg cat tat egg cct tea get gag 
ARG ARG CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU 

ctg ttc ccc ate att gtc age cag gac tgt ggg cat gag gag aca 
LEU PHE PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR 

gee cag gtc att get tec tat ggc age gca gtc aca cac ate egg 
ALA GLN VAL ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG 

545: caa cct gac ctg age aac att get gtg cag ccc gac cac cgc aag 
GLN PRO ASP LEU SER ASN ILE ALA VAL GLN PRO ASP HIS ARG LYS 

590; 



455; 
500: 



635: 



ttc cag ggc tac tac aag ate gca egg cat tac cgc tgg gca ttg 
PHE GLN GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU 

ggc caa ate ttc cac aat ttc aac tac cea gca get gtg gtg gtg 
GLY GLN ILE PHF. HIS ASN PH E ASN TYR PRO ALA ATA VAL VAL VAL 

680: gaa gat gat etc gag gtg gca cea gac ttc ttt gag tac ttc cag 
GLU ASP ASP LEU GLU VAL ALA PRO ASP PHE PHF. GLU TYR PHK RT.TJ 



725; 
770: 
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gcc act tac cca ctg ttg aaa gca gac ccc tec etc tgg tgt gtg 
ALA THR TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL 

tct gcc tgg aat gac aat ggc aaa gaa cag atg gta gac teg aet 
SER ALA TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP SER SER 

815: aag cca gag tta etc tac cgc aca gat ttc ttt cct ggc tta cgc 
LYS PRO GLU LEU LEU TYR ARG THR ASP PHR PHE PRn CLY LEU GLY 

FIGURE 4 
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860: tgg tta ctg ttg get gaa etc tgg get gaa ctg gag ccc aag tgg 
TRP LEU LEU LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP 

905: ccc aaa gee ttc tgg gat gac tgg atg cgc egg cot gag cag cga 
PRO LYS ALA PRE TRP J ^SP J^SP TRJ KET ARC ARG PRO GLU nT.TJ ARG 

950: aag ggg agg gcc tgt gtg cgt cca gaa ate tea aga aea atg aca 
LYS GLY ARG ALA CYS VAL ARG PRO GLU ILE SER ARG THR MET THR 

995: ttt gge egg aag ggt gtg age cat ggg cag ttc ttt gac eag cat 
- PHE GLY ARG LYS GLY VAL SER HIS GLY GLN PHE PHE ASP GLN HIS 

1040: etc aag ttc ate aag ctg aac cag cag ttt gta ccc ttc ace cag 
LEU LYS PHE ILE LYS LEU ASN GLN GLN PHE VAL PRO PHE THR GLN 

1085; ctg gac ctg teg tac ctt eag cag gag gcc tat gac egg gat ttc 
I-EU ASP LEU SER TYR LE U GLN GLN GLU ALA TYR ASP ARG ASP PHE 

1130: Ctt get cgt gtt tat ggt get ccc cag tta cag gtg gag aaa gtg 
LEU ALA ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL 

1175: agg acc aat gac egg aag gag eta gga gag gtg cgc gta cag tac 
ARG THR ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR 

1220: aca ggc agg gac age ttc aag get ttc gcc aag gcc ctg ggt gtc 
THR GLY ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL 

1265: atg gat gac etc aaa tea ggt gta ccc agg get gga tac egg ggc 
MET ASP ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY 

1310: att gtc acc ttc tta ttc egg gge cgc cgt gtc cac ctg gcg ccc 
ILE VAL THR PHE LEU PHE ARG GLY ARG ARG VAL HIS LEU ALA PRO 

1355: ect cag act tgg gat ggc tat gat cct agt tgg act 
PRO GLN_THR TRP ASP GLY TYR ASP PRO SER TRP THR 



1391 

1451 

1511 

1571 

1631 

1691 

1751 

1811 

1871 

1931 

1991 

2051 

2111 

2171 

2231 

2291 

2351 - 

2411 

2471 



taacagetec 
accgtagtcc 
ttttettttt 
ggagttagat 
tgatagggga 
aggteatctg 
etgagageca 
tctgggetac 
attgtcagag 
ccttattcct 
gatgaaaaat 
aaaaaaactt 
agtcccctec 
ecctgagttt 
taagttaeag 
tecaaaacat 
tgtgtgtgtg 
ggaagctaca 
aaaaaaccgg 



tgcctgtecc 
ctgggctgca 
tgagtggeat 
cagggaaage 
agaatgggct 
cagaggagtt 
gggcatgaga 
aggagaagtg 
tctaggtgca 
gacttctgtc 
gaagaggaaa 
cattgtacca 
ctgttgcttc 
tatacaeagg 
tggecaagac 
tcccatgtcc 
tgtgtgtgtg 
gaattatttt 
aattc 



ttctgggcte 
ttgtcttttc 
ttgaatacac 
attetgctgt 
ttttggggcc 
ggcaaettta 
cttcttgttc 
aacatattgt 
gttattgggt 
agctcttctt 
agaaatatte 
ettcaaagag 
aggagaatgc 
cteetcceta 
eaggacaaet 
tcacaggcta 
tgtgttttct 
caaaaataaa 



cttccttgea 

tgtctttcce 

agatgaeaag 

ctgttgggta 

agaaatatcc 

gctttcttaa 

atgctccttt 

ggecagaata 

t:gtcagagtt 

tctttgeage 

geaceeaget 

acactcttga 

tgtgctggtc 

aggctgtgge 

ccggccatga 

ggatgcagat 

tgcctgacct 

ggctgaattg 



attteatgat 
tettgggtec 
gtgagggttc 
tcaagcagea 
atgttctgag 
ccaggccttt 
ttaccttccc 
atactaacca 
aatgccttct 
etagcaattt 
attgggagaa 
cctcttcett 
agttctgtgt 
ttctggtggc 
gctaagtcet 
gttggttgga 
eagtttcatg 
tetgaaaaaa 



ctaagatggg 

attttttttt 

ttttgttaaa 

aaccactgtg 

tttttctctt 

tctttctgae 

ctaataaggg 

g^ggggcctc 

gttcttcttt 

ttggttctaa 

aggtagtggg 

tctaaaaatt 

gatccttett 

cetcctgaca 

gcctacctte 

gaggaatttg 

gatgaaaagt 

aaaaaaaaaa 
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1 aagttttgaa tgtttaagtt tatttaagtt tatttctaaa tattttctca ttcctctggc 
61 ttttgtaagt agggttttct catccatgtc tccttctcat gagttatttg tggatatgaa 
121 ggctatccat tagtatatgt tgatttttat attacacttc cttgctcagt tcattattga 
181 ttctttttga gttttccagg catattccca caagtaaaga taatagaaac agtttgcttc 
241 ctttccactt ctgcttcgaa tttttttctc ttggctcatt tgcattggct gcttcctcca 
301 gcaaaatgtt aaataacccr ggagatgatg ggcaacttcg ttttgctccc gacattcgtg 
361 gggtgcctct ggtgcttccc rgttggtaag gggttaaccg tagccctgag gtgggacatt 
421 tgattttaaa aatcagccar cttggggcgc ctaggttaga ggaatggtag gcagatgccg 
481 tcactccttg cccctcccct cctccttccc acctggaggg gaaatgaaat ctgacaggta 
541 gaaagagggg agttggggtt ctttttctct ctccctccac cagcatcact ctctgcctct 
601 ccctcaaaaa tacgttcctg ggtcaggata tatgttgact ccctagagag ctctggagcc 
661 aacctcctgg ccttcctcca ccctcactct tggccttttc ctgcccccat ctcctctacc 
721 tgtggggcat ggagccacga gcctttgtgt gacggtttgc tttctctctc ctgtctttag 
781 gtgcatggct gcctcctaat cccatagtcc agaggaggca tccctaggac tgcgggcaag 
841 ggagccgcaa gcccagggca gccttgaacc gtcccctggc ctgccctccg gtgggggcca 
901 ggatgctgaa gaagcagtct gcagggcttg tgctgtgggg cgctatcctc tttgtggcct 
961 ggaatgcccc gctgctccrc rtcttctgga cgcgcccagc acctggcagg ccacccccag 
1021 tcagcgcccc cgatggcgac cccgccagcc tcacccggga agtgattcgc ctggcccaag 
1081 acgccgaggt ggagctggag cgcaggcgtg ggctgccgca gcagatcggg gatgccctgt 
1141 cgagccagcg ggggagggtg cccaccgcgg cccctcccgc ccagccgcgt gtgcctgtga 
1201 cccccgcgcc ggcggtgatt cccatccrgg tcatcgcctg tgaccgcagc actgttcggc 
1261 gctgcctgga caagctgctg catcaccggc cctcggctga gctcttcccc atcatcgtta 
1321 gccaggactg cgggcacgag gagacggccc aggccatcgc ctcctacggc agcgcggtca 
1381 cgcacatccg gcagcccgac ctgagcagca rtgcggtgcc gccggaccac cgcaagttcc 
1441 agggctacta caagatcgcg cgccactacc gctgggcgct gggccaggtc ttccggcagt 
1501 ttcgcttccc cgcggccgtg gtggtggagg atgacctgga ggtggccccg gacttcttcg 
1561 agtactttcg ggccacctat ccgctgctga aggccgaccc ctccctgtgg tgcgtctcgg 
1621 cctggaatga caacggcaag gagcagatgg tggacgccag caggcctgag ctgctctacc 
1681 gcaccgactt tttccctggc ctgggctggc tgctgttggc cgagctctgg gctgagctgg 
IBOl IfllTS^ gccaaaggcc ttctgggacg actggatgcg gcggccggag cagcggcagi 
1801 ggcgggcctg catacgccct gagatctcaa gaacgatgac ctttggccgc aagggtgtga 
1861 cgcacgggca gttctttgac cagcacctca agtttatcaa gctgaaccag cagtttgtic 
1921 acttcaccca gctggacctg tcttacctgc agcgggaggc ctatgaccga gatttcctcg 
loll ^"f S^*^^* cggtgctccc cagctgcagg tggagaaagt gaggaccaat gaccggaagg 
2041 agctggggga ggtgcgggtg cagtatacgg ggagggacag cttcaaggct ttcgccaagg 
2101 ctctgggtgt tatggatgac cttaagtcgg gggttccgag agctggctac cggggtattg 
222^ lllTrlT^ gttccggggc cgccgtgtcc acctggcgcc cccaccgacg tgggagggc? 
2221 atgatcctag ctggaattag cacctgcctg tccttcctgg gccccttctt gccacatcat 
2281 gagctgaggt gaccacagtc cccaggctgc atcggcctgc ctgtgtttcc ctcttaggtg 
lltl r^r^"^"" "tgattttt ccgagtggca tttaagtgca caaatgataa caagaggLt 
2401 attctcccgt tctcaaggga gtcagatcag gggaactatt ctagggtatg ttgfgilgta 
2461 ttaagcagga aaacactgcg tggtgggggg cactgggctt gttggggcca caaatgtcca 
2521 cgtcctgagc tttctcctgg agcatgtgca gagagtttgg caacgttcgc tctcttgacc 
2581 agaccccttc tccctgactg gctcttccag ccaggcacga gccctccttc tatacctgct 
2641 ccccttccca gtggggactg agttatggga gaaggggaca tatttgtggc caaaatgata 
276? llllTl^^^ gggcttcctt gtcagggcct ggtggagttg gtgggtcatc ggggctcact 
2761 gcctcctgcc cttctctcct gtctgacccc cacttagccc ttctctcctt gcagcctagc 
2821 agtttatagt tctgagatgg aaagttgaag ggggcaagca agacctctcc tcagcccatg 
294i "^!f *=8tc aggagagagg tgcagggagg aaggccttgt gctgggacaa cctctctct? 
Inm cagagaggac tatgccctga cccctccttt ctgaaaatca gtgccctccc 

3^6^ St^^"''!^ ggaggctccr gctggcttgg tagaagacag aattcgatct gcctgtccct 
3?2J """"^f^e gggtttgaca cacaggctcc tctcagcatg aggtggagca gtgaccaggt 
3121 ^ggagcagtga ccaggacgcc tctggcccag tgctgcccag cctccccgcc cgctccHk L 
3181 cgccccatgt cctcacaggc caggacgcca tggcggccgg gagcatgcga 
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1: MET LEU LYS LYS GLN SER ALA GLY LEU VAL LEU TRP GLY ALA ILE 

16: LEU PHE VAL ALA TRP ASN ALA LEU LEU LEU LEU PHE PHE TRP THR 

31: ARG PRO ALA PRO GLY ARG PRO PRO SER VAL SER ALA LEU ASP GLY 

46: ASP PRO ALA SER LEU THR ARG GLU VAL ILE ARG LEU ALA GLN ASP 

61: ALA GLU VAL GLU LEU GLU ARG ARG ARG GLY LEU LEU GLN GLN ILE 

76 : GLY ASP ALA LEU SER SER GLN ARG GLY ARC VAL PRO THR ALA ALA 

91: PRO PRO ALA GLN PRO ARG VAL PRO VAL THR PRO ALA PRO ALA VAL 

106 : ILE PRO ILE LEU VAL ILE ALA CYS ASP ARG SER THR VAL ARG ARG 

121: CYS LEU ASP LYS LEU LEU HIS TYR ARG PRO SER ALA GLU LEU PHE 

136: PRO ILE ILE VAL SER GLN ASP CYS GLY HIS GLU GLU THR ALA GLN 

151: ALA ILE ALA SER TYR GLY SER ALA VAL THR HIS ILE ARG GLN PRO 

166: ASP LEU SER SER ILE ALA VAL PRO PRO ASP HIS ARG LYS PHE GLN 

181: GLY TYR TYR LYS ILE ALA ARG HIS TYR ARG TRP ALA LEU GLY GLN 

196 : VAL PHE ARG GLN PHE ARG PHE PRO ALA ALA VAL VAL VAL GLU ASP 

211: ASP LEU GLU VAL ALA PRO ASP PHE PHE GLU TYR PHE ARG ALA THR 

226: TYR PRO LEU LEU LYS ALA ASP PRO SER LEU TRP CYS VAL SER ALA 

241: TRP ASN ASP ASN GLY LYS GLU GLN MET VAL ASP ALA SER ARG PRO 

256: GLU LEU LEU TYR ARG THR ASP PHE PHE PRO GLY LEU GLY TRP LEU 

271: LEU LEU ALA GLU LEU TRP ALA GLU LEU GLU PRO LYS TRP PRO LYS 

286: ALA PHE TRP ASP ASP TRP MET ARG ARG PRO GLU GLN ARG GLN GLY 

301: ARG ALA CYS ILE ARG PRO GLU ILE SER ARG THR MET THR PHE GLY 

316 : ARG LYS GLY VAL THR HIS GLY GLN PHE PHE ASP GLN HIS LEU LYS 

331: PHE ILE LYS LEU ASN GLN fiLN PHE VAL HIS PHE THR GLN LEU ASP 

346: LEU SER TYR LEU GLN ARG GLU ALA TYR ASP ARG ASP PHE LEU ALA 

361: ARG VAL TYR GLY ALA PRO GLN LEU GLN VAL GLU LYS VAL ARG THR 

376: ASN ASP ARG LYS GLU LEU GLY GLU VAL ARG VAL GLN TYR THR GLY 

391: ARG ASP SER PHE LYS ALA PHE ALA LYS ALA LEU GLY VAL MET ASP 

406: ASP LEU LYS SER GLY VAL PRO ARG ALA GLY TYR ARG GLY ILE VAL 

421:. THR PHE GLN PHE ARC GLY ARG ARG VAL HIS LEU ALA PRO PRO PRO 

436: THR TRP GLU GLY TYR ASP PRO SER TRP ASN*** 
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880: 



924: 



START 

c c tgc cct ccg gtg ggg gcc aggjatg ctg aag aag cag tct gca 
. . CYS PRO PRO VAL GLY ALA ARG*MET LEU LYS LYS GLN SER ALA 

ggg ctt gtg ctg tgg ggc get ate etc ttt gtg gcc teg aat gcc 
3: GLY LEU VAL LEU TRP GLY ALA ILE LEU PHE VAL ALA TRP ASN ALA 

969: ctg ctg etc etc ttc ttc tgg acg cgc cca gca cct ggc age cea 
3 : LEU LEU LEU LEU PHE PHE TRP THR ARG PRO ALA PRO GLY ARG PRO 

1014: ccc tea gtc age get etc gat ggc gac ecc gee age etc ace egg 
3 : PRO SER VAL SER ALA LEU ASP GLY ASP PRO ALA SER LEU THR ARG 

gaa gtg att cgc ctg gcc caa gac gcc gag gtg gag ctg gag cgc 
3: GLU VAL ILE ARG LEU ALA GLN ASP ALA GLU VAL GLU LEU 6LU ARG 



1059: 



1104: 



agg cgt ggg ctg ctg cag cag ate ggg gat gee ctg teg age cag 
. 3 : ARG ARG GLY LEU LEU GLN GLN ILE GLY ASP ALA LEU SER SER GLN 

1149: egg ggg agg gtg ccc acc gcg gcc cct ccc gcc cag ccg cgt gtg 
3: ARG GLY ARG VAL PRO THR ALA ALA PRO PRO ALA GLN PRO ARG VAL 



1194: cct gtg acc ccc gcg ccg gcg gtg att ecc ate ctg gtc ate gcc 
3 : PRO VAL THR PRO ALA PRO ALA VAL ILE PRO ILE LEU VAL ILE ALA 

1239: tgt gac cgc age act gtt egg cgc tgc ctg gac aag ctg ctg cat 
3 : CYS ASP ARG SER THR VAL ARG ARG CYS LEU ASP LYS LEU LEU HIS 

1284: tat egg ccc teg get gag etc ttc ecc ate ate gtt age cag gac 
3: TYR ARG PRO SER ALA GLU LEU PHE PRO ILE ILE VAL SER GLN ASP 

1329: tgc ggg cac gag gag acg gcc cag gcc ate gcc tec tac ggc age 
3: CYS GLY HIS GLU GLU THR ALA GLN ALA ILE ALA SER TYR GLY SER 

1374: gcg gtc acg cac ate egg cag ccc gac ctg age age att gcg gtg 
3: ALA VAL THR HIS ILE ARC GLN PRO ASP LEU SER SER ILE ALA VAL 

1419: ccg ccg gae cac cgc aag ttc cag ggc tac tac aag ate gcg cgc 
3: PRO PRO ASP HIS ARG LYS PHE GLN GLY TYR TYR LYS ILE ALA ARG 



1464: 



1509: 



cac tac cgc tgg gcg ctg ggc cag gtc ttc egg cag ttt cgc ttc 
3: HIS TYR ARG TRP ALA LEU GLY GLN VAL PHE ARG GLN PHE ARG PHE 

ccc gcg gcc gtg gtg gtg gag gat gac ctg gag gtg gcc ccg gae 
3: PRO ALA ALA VAL VAL VAL GLU ASP ASP LEU GLU VAL ALA PRO ASP 



1554: ttc ttc gag tac ttt egg jgcc acc tat ccg ctg ctg aag gcc gac 
3: PHE PHE GLU TYR PHE ARG AIA THR TYR PRO LEU LEU LYS ALA ASP 

1599: ccc tec ctg tgg tgc gtc teg gcc tgg aat gac aac ggc aag gag 
3: PRO SER LEU TRP CYS VAL SER ALA TRP ASN ASP ASN GLY LYS GLU 

cag atg.gtg gae gee age agg cct gag ctg etc tac cgc acc gac 
3: GLN MET VAL ASP ALA SER ARG PRO GLU LEU LEU TYR ARG THR ASP 

FIGURE 8 
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1689: ttt ttc cct ggc ctg ggc tgg ctg ctg ttg gcc gag etc tgg get 
3: PHE PHE PRO GLY LEU GLY TRP LEU LEU LEU ALA GLU LEU TRP ALA 

1734: gag ctg gag ceo aag tgg cca aag gcc ttc tgg gac gac tgg atg 
3: GLU LEU GLU PRO LYS TRP PRO LYS ALA PHE TRP ASP ASP TRP MET 

1779: egg egg cog gag cag egg cag ggg egg gee tgc ata ege cct gag 
3: ARG ARG PRO GLU GLN ARG GLN GLY ARG ALA CYS ILE ARG PRO GLU 

182A: ate tea aga aeg atg ace ttt gge cge aag ggt gtg acg cac ggg 
3: ILE SER ARG THR MET THR PHE GLY ARG LYS GLY VAL THR HIS GLY 

1869: cag ttc ttt gac cag eac etc aag ttt ate aag ctg aac cag eag 
3: GLN PHE PHE ASP GLN HIS LEU LYS PHE ILE LYS LEU ASN GLN GLN 

191A: ttt gtg eac ttc ace cag ctg gae ctg tet tac ctg cag egg gag 
3 : PHE VAL HIS PHE THR GLN LEU ASP LEU SER TYR LEU GLN ARG GLU 

1959: gee tat gac cga gat ttc etc gee cge gtc tac ggt get ccc cag 
3 : ALA TYR ASP ARG ASP PHE LEU ALA ARG VAL TYR GLY ALA PRO GLN 

2004: ctg cag gtg gag aaa gtg agg acc aat gac egg aag gag ctg ggg 
3 : LEU GLN VAL GLU LYS VAL ARG THR ASN ASP ARG LYS GLU LEU GLY 

2049: gag gtg egg gtg cag tat aeg ggg agg gae age ttc aag get ttc 
3 : GLU VAL ARG VAL GLN TYR THR GLY ARG ASP SER PHE LYS ALA PHE 

2094: gcc aag get ctg ggt gtt atg gat gac ett aag teg ggg gtt ccg 
3: ALA LYS ALA LEU GLY VAL MET ASP ASP LEU LYS SER GLY VAL PRO 

2139: aga get gge tac egg ggt att gtc ace ttc cag ttc egg ggc cge 
3 : ARG ALA GLY TYR ARG GLY ILE VAL THR PHE GLN PHE ARG GLY ARG 

2184: egt gtc cac ctg gcg ccc eca ccg acg tgg gag ggc tat gat cct 
3 : ARG VAL HIS LEU ALA PRO PRO PRO THR TRP GLU GLY TYR ASP PRO 
STOP 
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